hadoop 3 下载 目前是 2017年5月发布的3.0.0-alpha3

wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha3/hadoop-3.0.0-alpha3.tar.gz


需要java sdk 1.7 以上

[zhouhh@mainServer hadoop-3.0.0-alpha3]$ !cat
cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)
[zhouhh@mainServer hadoop-3.0.0-alpha3]$ echo $JAVA_HOME
[zhouhh@mainServer hadoop-3.0.0-alpha3]$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b12)
OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)
hadoop-3.0.0-alpha3]$ javac -version
javac 1.8.0_131


[zhouhh@mainServer hadoop-3.0.0-alpha3]$ ./bin/hadoop
[zhouhh@mainServer java]$ ln -s hadoop-3.0.0-alpha3 hadoop

[zhouhh@mainServer ~]$ vi .bashrc

export HADOOP_HOME="${HOME}/java/hadoop"
[zhouhh@mainServer ~]$ source .bashrc

下面的命令将etc下面的配置文件作为输入, 查找相关内容,并放到输出.

[zhouhh@mainServer ~]$ cd test
[zhouhh@mainServer test]$ ls
[zhouhh@mainServer test]$ mkdir hadoop
[zhouhh@mainServer test]$ cd hadoop
[zhouhh@mainServer hadoop]$ ls
[zhouhh@mainServer hadoop]$ mkdir input
[zhouhh@mainServer hadoop]$ cp $HADOOP_HOME/etc/hadoop/*.xml input
[zhouhh@mainServer hadoop]$ ls input
capacity-scheduler.xml  core-site.xml  hadoop-policy.xml  hdfs-site.xml  httpfs-site.xml  kms-acls.xml  kms-site.xml  yarn-site.xml
[zhouhh@mainServer hadoop]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z.]+'
[zhouhh@mainServer hadoop]$ ls output/
part-r-00000  _SUCCESS
[zhouhh@mainServer hadoop]$ cat output/*
1	dfsadmin


可以在一台设备启动多个hadoop java进程.

[zhouhh@mainServer hadoop]$ vi core-site.xml


[zhouhh@mainServer hadoop]$ vi hdfs-site.xml



[zhouhh@mainServer hadoop]$ ssh localhost
Last login: Thu Jun 29 12:15:14 2017 from localhost


[zhouhh@mainServer ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[zhouhh@mainServer ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[zhouhh@mainServer ~]$ chmod 0600 ~/.ssh/authorized_keys


[zhouhh@mainServer ~]$ hdfs namenode -format

会在下面的目录创建格式化主节点 /tmp/hadoop-zhouhh/dfs/name

[zhouhh@mainServer ~]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [mainServer]

ssh: Could not resolve hostname mainserver: Name or service not known
[zhouhh@mainServer ~]$ sudo vi /etc/hosts msvr
[zhouhh@mainServer ~]$ sudo hostname msvr
[zhouhh@msvr ~]$ sudo vi /etc/hostname

[zhouhh@msvr ~]$ stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [msvr]
[zhouhh@msvr ~]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [msvr]

日志在$HADOOP_LOG_DIR 目录 (缺省值 $HADOOP_HOME/logs). 可以通过 访问name node的web页面,本地访问 http://localhost:9870/


[zhouhh@msvr ~]$ hdfs

[zhouhh@msvr ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - zhouhh supergroup          0 2017-06-29 15:17 /user
[zhouhh@msvr ~]$ hdfs dfs -mkdir /user/zhouhh
[zhouhh@msvr ~]$ hdfs dfs -mkdir input
[zhouhh@msvr ~]$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x   - zhouhh supergroup          0 2017-06-29 15:41 /user/zhouhh
[zhouhh@msvr ~]$ hdfs dfs -ls /user/zhouhh
Found 1 items
drwxr-xr-x   - zhouhh supergroup          0 2017-06-29 15:41 /user/zhouhh/input
[zhouhh@msvr ~]$ hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml input

[zhouhh@msvr ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z.]+'
[zhouhh@msvr ~]$ hdfs dfs -cat /user/zhouhh/output/*
1	dfsadmin
1	dfs.replication

[zhouhh@msvr ~]$ hdfs dfs -cat output/*
1	dfsadmin
1	dfs.replication
[zhouhh@msvr hadoop]$ hdfs dfs -get output output


可以在Yarn上运行MapReduce任务. 设置一些参数, 并且运行ResourceManager和NodeManager的后台程序.

[zhouhh@msvr hadoop]$ cd etc/hadoop/
[zhouhh@msvr hadoop]$ cp mapred-site.xml.template mapred-site.xml
[zhouhh@msvr hadoop]$ vi mapred-site.xml


[zhouhh@msvr hadoop]$ vi yarn-site.xml

<!-- Site specific YARN configuration properties -->

开启ResourceManager daemon 和 NodeManager daemon

[zhouhh@msvr hadoop]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

访问 或本机 http://localhost:8088/ 进入ResourceManager web页面


[zhouhh@msvr hadoop]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z.]+'

org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/zhouhh/output already exists

将此前生成的output目录清除,即去除上述错误. 可以在http://看到上述任务的调度情况.


[zhouhh@msvr hadoop]$ stop-yarn.sh



