2017-07-01
下载
hadoop 3 下载 目前是 2017年5月发布的3.0.0-alpha3
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.0.0-alpha3/hadoop-3.0.0-alpha3.tar.gz
java环境
需要java sdk 1.7 以上
[zhouhh@mainServer hadoop-3.0.0-alpha3]$ !cat
cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)
[zhouhh@mainServer hadoop-3.0.0-alpha3]$ echo $JAVA_HOME
/etc/alternatives/java_sdk_openjdk
[zhouhh@mainServer
[zhouhh@mainServer hadoop-3.0.0-alpha3]$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b12)
OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)
hadoop-3.0.0-alpha3]$ javac -version
javac 1.8.0_131
启动hadoop单节点
下面的命令可以看到帮助信息
[zhouhh@mainServer hadoop-3.0.0-alpha3]$ ./bin/hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
where CLASSNAME is a user-provided Java class
OPTIONS is none or any of:
buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--debug turn on shell script debug mode
--help usage information
hostnames list[,of,host,names] hosts to use in slave mode
hosts filename list of hosts to use in slave mode
loglevel level set the log4j level for this command
workers turn on worker mode
SUBCOMMAND is one of:
archive create a Hadoop archive
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries
conftest validate configuration XML files
credential interact with credential providers
daemonlog get/set the log level for each daemon
distch distributed metadata changer
distcp copy file or directories recursively
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
fs run a generic filesystem user client
gridmix submit a mix of synthetic job, modeling a profiled from production load
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath prints the java.library.path
kerbname show auth_to_local principal conversion
key manage keys via the KeyProvider
kms run KMS, the Key Management Server
rumenfolder scale a rumen input trace
rumentrace convert logs into a rumen trace
trace view and modify Hadoop tracing settings
version print the version
SUBCOMMAND may print help when invoked w/o parameters or with -h.
[zhouhh@mainServer java]$ ln -s hadoop-3.0.0-alpha3 hadoop
[zhouhh@mainServer ~]$ vi .bashrc
export HADOOP_HOME="${HOME}/java/hadoop"
export PATH="$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH"
[zhouhh@mainServer ~]$ source .bashrc
下面的命令将etc下面的配置文件作为输入, 查找相关内容,并放到输出.
[zhouhh@mainServer ~]$ cd test
[zhouhh@mainServer test]$ ls
cnn.py
[zhouhh@mainServer test]$ mkdir hadoop
[zhouhh@mainServer test]$ cd hadoop
[zhouhh@mainServer hadoop]$ ls
[zhouhh@mainServer hadoop]$ mkdir input
[zhouhh@mainServer hadoop]$ cp $HADOOP_HOME/etc/hadoop/*.xml input
[zhouhh@mainServer hadoop]$ ls input
capacity-scheduler.xml core-site.xml hadoop-policy.xml hdfs-site.xml httpfs-site.xml kms-acls.xml kms-site.xml yarn-site.xml
[zhouhh@mainServer hadoop]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z.]+'
[zhouhh@mainServer hadoop]$ ls output/
part-r-00000 _SUCCESS
[zhouhh@mainServer hadoop]$ cat output/*
1 dfsadmin
伪分布式配置
可以在一台设备启动多个hadoop java进程.
[zhouhh@mainServer hadoop]$ vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
[zhouhh@mainServer hadoop]$ vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
确认本地ssh不需要密码
[zhouhh@mainServer hadoop]$ ssh localhost
Last login: Thu Jun 29 12:15:14 2017 from localhost
如果需要密码,则执行下面的命令:
[zhouhh@mainServer ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[zhouhh@mainServer ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[zhouhh@mainServer ~]$ chmod 0600 ~/.ssh/authorized_keys
创建主节点
[zhouhh@mainServer ~]$ hdfs namenode -format
会在下面的目录创建格式化主节点 /tmp/hadoop-zhouhh/dfs/name
[zhouhh@mainServer ~]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [mainServer]
ssh: Could not resolve hostname mainserver: Name or service not known
[zhouhh@mainServer ~]$ sudo vi /etc/hosts
10.6.0.200 msvr
[zhouhh@mainServer ~]$ sudo hostname msvr
[zhouhh@msvr ~]$ sudo vi /etc/hostname
msvr
[zhouhh@msvr ~]$ stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [msvr]
[zhouhh@msvr ~]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [msvr]
日志在$HADOOP_LOG_DIR 目录 (缺省值 $HADOOP_HOME/logs). 可以通过 http://10.6.0.200:9870/ 访问name node的web页面,本地访问 http://localhost:9870/
操作hdfs
[zhouhh@msvr ~]$ hdfs
Usage: hdfs [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
OPTIONS is none or any of:
--buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--daemon (start|status|stop) operate on a daemon
--debug turn on shell script debug mode
--help usage information
--hostnames list[,of,host,names] hosts to use in worker mode
--hosts filename list of hosts to use in worker mode
--loglevel level set the log4j level for this command
--workers turn on worker mode
SUBCOMMAND is one of:
balancer run a cluster balancing utility
cacheadmin configure the HDFS cache
classpath prints the class path needed to get the hadoop jar and the required libraries
crypto configure HDFS encryption zones
datanode run a DFS datanode
debug run a Debug Admin to execute HDFS debug commands
dfsadmin run a DFS admin client
dfs run a filesystem command on the file system
diskbalancer Distributes data evenly among disks on a given node
envvars display computed Hadoop environment variables
erasurecode run a HDFS ErasureCoding CLI
fetchdt fetch a delegation token from the NameNode
fsck run a DFS filesystem checking utility
getconf get config values from configuration
groups get the groups which users belong to
haadmin run a DFS HA admin client
jmxget get JMX exported values from NameNode or DataNode.
journalnode run the DFS journalnode
lsSnapshottableDir list all snapshottable dirs owned by the current user
mover run a utility to move block replicas across storage types
namenode run the DFS namenode
nfs3 run an NFS version 3 gateway
oev apply the offline edits viewer to an edits file
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to a legacy fsimage
portmap run a portmap service
secondarynamenode run the DFS secondary namenode
snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot
storagepolicies list/get/set block storage policies
version print the version
zkfc run the ZK Failover Controller daemon
SUBCOMMAND may print help when invoked w/o parameters or with -h.
[zhouhh@msvr ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x - zhouhh supergroup 0 2017-06-29 15:17 /user
[zhouhh@msvr ~]$ hdfs dfs -mkdir /user/zhouhh
[zhouhh@msvr ~]$ hdfs dfs -mkdir input
[zhouhh@msvr ~]$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x - zhouhh supergroup 0 2017-06-29 15:41 /user/zhouhh
[zhouhh@msvr ~]$ hdfs dfs -ls /user/zhouhh
Found 1 items
drwxr-xr-x - zhouhh supergroup 0 2017-06-29 15:41 /user/zhouhh/input
[zhouhh@msvr ~]$ hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml input
[zhouhh@msvr ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z.]+'
[zhouhh@msvr ~]$ hdfs dfs -cat /user/zhouhh/output/*
1 dfsadmin
1 dfs.replication
或者
[zhouhh@msvr ~]$ hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
或者拉到本地
[zhouhh@msvr hadoop]$ hdfs dfs -get output output
单机的Yarn
可以在Yarn上运行MapReduce任务. 设置一些参数, 并且运行ResourceManager和NodeManager的后台程序.
[zhouhh@msvr hadoop]$ cd etc/hadoop/
[zhouhh@msvr hadoop]$ cp mapred-site.xml.template mapred-site.xml
[zhouhh@msvr hadoop]$ vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[zhouhh@msvr hadoop]$ vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
开启ResourceManager daemon 和 NodeManager daemon
[zhouhh@msvr hadoop]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
访问 http://10.6.0.200:8088/ 或本机 http://localhost:8088/ 进入ResourceManager web页面
执行Mapreduce任务
[zhouhh@msvr hadoop]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha3.jar grep input output 'dfs[a-z.]+'
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/zhouhh/output already exists
将此前生成的output目录清除,即去除上述错误. 可以在http://10.6.0.200:8088/看到上述任务的调度情况.
停止Yarn
[zhouhh@msvr hadoop]$ stop-yarn.sh
HDFS架构
如非注明转载, 均为原创. 本站遵循知识共享CC协议,转载请注明来源