Pydoop 是用python 对Hadoop 的C++API的MapReduce和HDFS的封装。程序很小,只有500多KB。
[zhouhh@Hadoop48 ~]$ tar -zxvf pydoop-0.5.2-rc2.tar.gz
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ python setup.py build
build通过后,执行安装
可以安装在系统中或安装在本地
sudo安装时如果不跟参数,会导致环境变量不可用:
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ sudo python setup.py install
错误:
RuntimeError: Could not determine JAVA_HOME path
但是该环境变量是存在的:
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ echo $JAVA_HOME
/usr/java/jdk1.7.0
打开setup.py
self.java_home = os.getenv("JAVA_HOME", find_first_existing("/opt/sun-jdk", "/usr/lib/jvm/java-6-sun"))
改为:
self.java_home = os.getenv("JAVA_HOME", find_first_existing("/opt/sun-jdk", "/usr/lib/jvm/java-6-sun","/usr/java/jdk1.7.0"))
再执行setup.py install
ValueError: HADOOP_HOME not set
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ echo $HADOOP_HOME
/home/zhouhh/hadoop-1.0.3
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ echo $HADOOP_HOME
/home/zhouhh/hadoop-1.0.3
找到:
paths = reduce(list.add, map(glob.glob, (“/opt/hadoop”, “/usr/lib/hadoop”, “/usr/local/lib/hadoop*”)))
改为:
paths = reduce(list.add, map(glob.glob, (“/opt/hadoop”, “/usr/lib/hadoop”, “/usr/local/lib/hadoop”,”/home/zhouhh/hadoop-”)))
…
为了避免环境变量问题,
如果安装到系统,应略过创建:
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ sudo python setup.py install –skip-build
或者直接装在当前用户下:
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ python setup.py install –user
或安装到指定目录:
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ python setup.py install –home /home/zhouhh/pydoop
检验是否成功:
[zhouhh@Hadoop48 pydoop-0.5.2-rc2]$ cd test
[zhouhh@Hadoop48 test]$ python all_test.py
ImportError: /usr/lib64/libboost_python.so.2: undefined symbol: PyUnicodeUCS4_FromEncodedObject
单词计数示例
from pydoop.pipes import Mapper, Reducer, Factory, runTask
class WordCountMapper(Mapper):
def map(self, context):
words = context.getInputValue().split()
for w in words:
context.emit(w, "1")
class WordCountReducer(Reducer):
def reduce(self, context):
s = 0
while context.nextValue():
s += int(context.getInputValue())
context.emit(context.getInputKey(), str(s))
runTask(Factory(WordCountMapper, WordCountReducer))
对简单任务,可以使用pydoop_script工具:
def mapper(k, text, writer):
for word in text.split():
writer.emit(word, 1)
def reducer(word, count, writer):
writer.emit(word, sum(map(int, count)))
参考:
下载:https://sourceforge.net/projects/pydoop/files
示例地址:http://pydoop.sourceforge.net/docs/examples/index.html
最新版下载:http://sourceforge.net/projects/pydoop/files/Pydoop-0.5/pydoop-0.5.2-rc2.tar.gz/download
主页:http://sourceforge.net/apps/mediawiki/pydoop/index.php?title=Main_Page
如非注明转载, 均为原创. 本站遵循知识共享CC协议,转载请注明来源