Spark On Yarn(HDFS HA)详细配置过程

数据库 Spark
我们将要介绍Spark On Yarn详细配置过程,包括服务器分布以及Spark的部署全部过程。

 一、服务器分布及相关说明

1、服务器角色

2、Hadoop(HDFS HA)总体架构

wKiom1MIvlDSBD-qAAG9vKKqAvE067.jpg

 

二、基础环境部署

1、JDK安装

http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz

  1. # tar xvzf jdk-7u45-linux-x64.tar.gz -C/usr/local 
  2. # cd /usr/local 
  3. # ln -s jdk1.7.0_45 jdk 
  4.  
  5. # vim /etc/profile 
  6. export JAVA_HOME=/usr/local/jdk 
  7. export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
  8. export PATH=$PATH:$JAVA_HOME/bin 
  9.  
  10. # source /etc/profile 

2、Scala安装

  1. http://www.scala-lang.org/files/archive/scala-2.10.3.tgz 
  2. # tar xvzf scala-2.10.3.tgz -C/usr/local 
  3. # cd /usr/local 
  4. # ln -s scala-2.10.3 scala 
  5.   
  6. # vim /etc/profile 
  7. export SCALA_HOME=/usr/local/scala 
  8. export PATH=$PATH:$SCALA_HOME/bin 
  9.   
  10. # source /etc/profile 

3SSH免密码登录

可参考文章:

http://blog.csdn.net/codepeak/article/details/14447627

......

4、主机名设置

  1. # vim /etc/hosts 
  2. 172.18.35.29     namenode1 
  3. 172.18.35.30     namenode2 
  4. 172.18.34.232   datanode1 
  5. 172.18.24.136   datanode2 

三、ZooKeeper集群部署

1、ZooKeeper安装

  1. http://apache.dataguru.cn/zookeeper/stable/zookeeper-3.4.5.tar.gz 
  2. # tar xvzf zookeeper-3.4.5.tar.gz -C/usr/local 
  3. # cd /usr/local 
  4. # ln -s zookeeper-3.4.5 zookeeper 
  5.   
  6. # vim /etc/profile 
  7. export ZOO_HOME=/usr/local/zookeeper 
  8. export ZOO_LOG_DIR=/data/hadoop/zookeeper/logs 
  9. export PATH=$PATH:$ZOO_HOME/bin 
  10.   
  11. # source /etc/profile 

2ZooKeeper配置与启动

  1. # mkdir -p/data/hadoop/zookeeper/{data,logs} 
  2.   
  3. # vim /usr/local/zookeeper/conf/zoo.cfg 
  4. tickTime=2000 
  5. initLimit=10 
  6. syncLimit=5 
  7.   
  8. dataDir=/data/hadoop/zookeeper/data 
  9. clientPort=2181 
  10.   
  11. server.1=172.18.35.29:2888:3888 
  12. server.2=172.18.35.30:2888:3888 
  13. server.3=172.18.34.232:2888:3888 

172.18.35.29上执行:

  1. echo 1 > /data/hadoop/zookeeper/data/myid 

172.18.35.30 上执行:

  1. echo 2 > /data/hadoop/zookeeper/data/myid 

172.18.34.232 上执行:

  1. echo 3 > /data/hadoop/zookeeper/data/myid 

## 启动ZooKeeper集群

# cd /usr/local/zookeeper && bin/zkServer.sh start

wKioL1MIvv6i5q7kAACZ4VV5Ezk922.jpg

# ./bin/zkCli.sh -server localhost:2181

wKioL1MIvzyRnPW4AAShDeH9pxU408.jpg

测试zookeeper集群是否建立成功,如无报错表示集群创建成功

# bin/zkServer.sh status

wKiom1MIv5zzb9NqAACTecwASew063.jpg

四、Hadoop(HDFS HA)集群部署

1、hadoop环境安装

Hadoop的源码编译部分可以参考:

http://sofar.blog.51cto.com/353572/1352713

  1. # tar xvzf hadoop-2.2.0.tgz -C/usr/local 
  2. # cd /usr/local 
  3. # ln -s hadoop-2.2.0 hadoop 
  4.   
  5. # vim /etc/profile 
  6. export HADOOP_HOME=/usr/local/hadoop 
  7. export HADOOP_PID_DIR=/data/hadoop/pids 
  8. export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
  9. export HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=$HADOOP_HOME/lib/native" 
  10.   
  11. export HADOOP_MAPRED_HOME=$HADOOP_HOME 
  12. export HADOOP_COMMON_HOME=$HADOOP_HOME 
  13. export HADOOP_HDFS_HOME=$HADOOP_HOME 
  14. export YARN_HOME=$HADOOP_HOME 
  15.   
  16. export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop 
  17. export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop 
  18. export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop 
  19.   
  20. export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native 
  21.   
  22. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 
  23.   
  24. # mkdir -p /data/hadoop/{pids,storage} 
  25. # mkdir -p/data/hadoop/storage/{hdfs,tmp,journal} 
  26. # mkdir -p/data/hadoop/storage/tmp/nodemanager/{local,remote,logs} 
  27. # mkdir -p/data/hadoop/storage/hdfs/{name,data} 

2core.site.xml配置

# vim/usr/local/hadoop/etc/hadoop/core-site.xml

  1. <?xml version="1.0" encoding="UTF-8"?> 
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
  3. <configuration> 
  4.     <property> 
  5.         <name>fs.defaultFS</name> 
  6.         <value>hdfs://appcluster</value> 
  7.     </property> 
  8.                                                                                                                                                               
  9.     <property> 
  10.         <name>io.file.buffer.size</name> 
  11.         <value>131072</value> 
  12.     </property> 
  13.                                                                                                                                                               
  14.     <property> 
  15.         <name>hadoop.tmp.dir</name> 
  16.         <value>file:/data/hadoop/storage/tmp</value> 
  17.     </property> 
  18.                                                                                                                                                               
  19.     <property> 
  20.         <name>ha.zookeeper.quorum</name> 
  21.         <value>172.18.35.29:2181,172.18.35.30:2181,172.18.34.232:2181</value> 
  22.     </property> 
  23.                                                                                                                                                               
  24.     <property> 
  25.         <name>ha.zookeeper.session-timeout.ms</name> 
  26.         <value>2000</value> 
  27.     </property> 
  28.                                                                                                                                                               
  29.     <property> 
  30.         <name>fs.trash.interval</name> 
  31.         <value>4320</value> 
  32.     </property> 
  33.                                                                                                                                                               
  34.     <property> 
  35.          <name>hadoop.http.staticuser.use</name> 
  36.          <value>root</value> 
  37.     </property> 
  38.                                                                                                                                                               
  39.     <property> 
  40.         <name>hadoop.proxyuser.hadoop.hosts</name> 
  41.         <value>*</value> 
  42.     </property> 
  43.                                                                                                                                                               
  44.     <property> 
  45.         <name>hadoop.proxyuser.hadoop.groups</name> 
  46.         <value>*</value> 
  47.     </property> 
  48.                                                                                                                                                               
  49.     <property> 
  50.         <name>hadoop.native.lib</name> 
  51.         <value>true</value> 
  52.     </property> 
  53. </configuration> 

3hdfs-site.xml配置

# vim/usr/local/hadoop/etc/hadoop/hdfs-site.xml

  1. <?xml version="1.0" encoding="UTF-8"?> 
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
  3. <configuration> 
  4.     <property> 
  5.         <name>dfs.namenode.name.dir</name> 
  6.         <value>file:/data/hadoop/storage/hdfs/name</value> 
  7.     </property> 
  8.                                                                                                                                                   
  9.     <property> 
  10.         <name>dfs.datanode.data.dir</name> 
  11.         <value>file:/data/hadoop/storage/hdfs/data</value> 
  12.     </property> 
  13.                                                                                                                                                   
  14.     <property> 
  15.         <name>dfs.replication</name> 
  16.         <value>2</value> 
  17.     </property> 
  18.                                                                                                                                                   
  19.     <property> 
  20.         <name>dfs.blocksize</name> 
  21.         <value>67108864</value> 
  22.     </property> 
  23.                                                                                                                                                   
  24.     <property> 
  25.         <name>dfs.datanode.du.reserved</name> 
  26.         <value>53687091200</value> 
  27.     </property> 
  28.                                                                                                                                                   
  29.     <property> 
  30.         <name>dfs.webhdfs.enabled</name> 
  31.         <value>true</value> 
  32.     </property> 
  33.                                                                                                                                                   
  34.     <property> 
  35.         <name>dfs.permissions</name> 
  36.         <value>false</value> 
  37.     </property> 
  38.                                                                                                                                                   
  39.     <property> 
  40.         <name>dfs.permissions.enabled</name> 
  41.         <value>false</value> 
  42.     </property> 
  43.                                                                                                                                                   
  44.     <property> 
  45.         <name>dfs.nameservices</name> 
  46.         <value>appcluster</value> 
  47.     </property> 
  48.                                                                                                                                                   
  49.     <property> 
  50.         <name>dfs.ha.namenodes.appcluster</name> 
  51.         <value>nn1,nn2</value> 
  52.     </property> 
  53.                                                                                                                                                   
  54.     <property> 
  55.         <name>dfs.namenode.rpc-address.appcluster.nn1</name> 
  56.         <value>namenode1:8020</value> 
  57.     </property> 
  58.                                                                                                                                                   
  59.     <property> 
  60.         <name>dfs.namenode.rpc-address.appcluster.nn2</name> 
  61.         <value>namenode2:8020</value> 
  62.     </property> 
  63.                                                                                                                                                   
  64.     <property> 
  65.         <name>dfs.namenode.servicerpc-address.appcluster.nn1</name> 
  66.         <value>namenode1:53310</value> 
  67.     </property> 
  68.                                                                                                                                                   
  69.     <property> 
  70.         <name>dfs.namenode.servicerpc-address.appcluster.nn2</name> 
  71.         <value>namenode2:53310</value> 
  72.     </property> 
  73.                                                                                                                                                   
  74.     <property> 
  75.         <name>dfs.namenode.http-address.appcluster.nn1</name> 
  76.         <value>namenode1:8080</value> 
  77.     </property> 
  78.                                                                                                                                                   
  79.     <property> 
  80.         <name>dfs.namenode.http-address.appcluster.nn2</name> 
  81.         <value>namenode2:8080</value> 
  82.     </property> 
  83.                                                                                                                                                   
  84.     <property> 
  85.         <name>dfs.datanode.http.address</name> 
  86.         <value>0.0.0.0:8080</value> 
  87.     </property> 
  88.                                                                                                                                                   
  89.     <property> 
  90.         <name>dfs.namenode.shared.edits.dir</name> 
  91.         <value>qjournal://namenode1:8485;namenode2:8485;datanode1:8485/appcluster</value> 
  92.     </property> 
  93.                                                                                                                                                   
  94.     <property> 
  95.         <name>dfs.client.failover.proxy.provider.appcluster</name> 
  96.         <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
  97.     </property> 
  98.                                                                                                                                                   
  99.     <property> 
  100.         <name>dfs.ha.fencing.methods</name> 
  101.         <value>sshfence(root:36000)</value> 
  102.     </property> 
  103.                                                                                                                                                   
  104.     <property> 
  105.         <name>dfs.ha.fencing.ssh.private-key-files</name> 
  106.         <value>/root/.ssh/id_dsa_nn1</value> 
  107.     </property> 
  108.                                                                                                                                                   
  109.     <property> 
  110.         <name>dfs.ha.fencing.ssh.connect-timeout</name> 
  111.         <value>30000</value> 
  112.     </property> 
  113.                                                                                                                                                   
  114.     <property> 
  115.         <name>dfs.journalnode.edits.dir</name> 
  116.         <value>/data/hadoop/storage/hdfs/journal</value> 
  117.     </property> 
  118.                                                                                                                                                   
  119.     <property> 
  120.         <name>dfs.ha.automatic-failover.enabled</name> 
  121.         <value>true</value> 
  122.     </property> 
  123.                                                                                                                                                   
  124.     <property> 
  125.         <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> 
  126.         <value>60000</value> 
  127.     </property> 
  128.                                                                                                                                                   
  129.     <property> 
  130.         <name>ipc.client.connect.timeout</name> 
  131.         <value>60000</value> 
  132.     </property> 
  133.                                                                                                                                                   
  134.     <property> 
  135.         <name>dfs.image.transfer.bandwidthPerSec</name> 
  136.         <value>41943040</value> 
  137.     </property> 
  138.                                                                                                                                                   
  139.     <property> 
  140.         <name>dfs.namenode.accesstime.precision</name> 
  141.         <value>3600000</value> 
  142.     </property> 
  143.                                                                                                                                                   
  144.     <property> 
  145.         <name>dfs.datanode.max.transfer.threads</name> 
  146.         <value>4096</value> 
  147.     </property> 
  148. </configuration> 

4mapred-site.xml配置

# vim/usr/local/hadoop/etc/hadoop/mapred-site.xml

  1. <?xml version="1.0"?> 
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
  3. <configuration> 
  4.     <property> 
  5.         <name>mapreduce.framework.name</name> 
  6.         <value>yarn</value> 
  7.     </property> 
  8.                                                                                                                                        
  9.     <property> 
  10.         <name>mapreduce.jobhistory.address</name> 
  11.         <value>namenode1:10020</value> 
  12.     </property> 
  13.                                                                                                                                        
  14.     <property> 
  15.         <name>mapreduce.jobhistory.webapp.address</name> 
  16.         <value>namenode1:19888</value> 
  17.     </property> 
  18. </configuration> 

5yarn-site.xml配置

# vim/usr/local/hadoop/etc/hadoop/yarn-site.xml

  1. <?xml version="1.0"?> 
  2. <configuration> 
  3.     <property> 
  4.         <name>yarn.nodemanager.aux-services</name> 
  5.         <value>mapreduce_shuffle</value> 
  6.     </property> 
  7.                                                                                                          
  8.     <property> 
  9.         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
  10.         <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
  11.     </property> 
  12.                                                                                                          
  13.     <property> 
  14.         <name>yarn.resourcemanager.scheduler.address</name> 
  15.         <value>namenode1:8030</value> 
  16.     </property> 
  17.                                                                                                          
  18.     <property> 
  19.         <name>yarn.resourcemanager.resource-tracker.address</name> 
  20.         <value>namenode1:8031</value> 
  21.     </property> 
  22.                                                                                                          
  23.     <property> 
  24.         <name>yarn.resourcemanager.address</name> 
  25.         <value>namenode1:8032</value> 
  26.     </property> 
  27.                                                                                                          
  28.     <property> 
  29.         <name>yarn.resourcemanager.admin.address</name> 
  30.         <value>namenode1:8033</value> 
  31.     </property> 
  32.                                                                                                          
  33.     <property> 
  34.         <name>yarn.nodemanager.address</name> 
  35.         <value>namenode1:8034</value> 
  36.     </property> 
  37.                                                                                                          
  38.     <property> 
  39.         <name>yarn.nodemanager.webapp.address</name> 
  40.         <value>namenode1:80</value> 
  41.     </property> 
  42.                                                                                                          
  43.     <property> 
  44.         <name>yarn.resourcemanager.webapp.address</name> 
  45.         <value>namenode1:80</value> 
  46.     </property> 
  47.                                                                                                          
  48.     <property> 
  49.         <name>yarn.nodemanager.local-dirs</name> 
  50.         <value>${hadoop.tmp.dir}/nodemanager/local</value> 
  51.     </property> 
  52.                                                                                                          
  53.     <property> 
  54.         <name>yarn.nodemanager.remote-app-log-dir</name> 
  55.         <value>${hadoop.tmp.dir}/nodemanager/remote</value> 
  56.     </property> 
  57.                                                                                                          
  58.     <property> 
  59.         <name>yarn.nodemanager.log-dirs</name> 
  60.         <value>${hadoop.tmp.dir}/nodemanager/logs</value> 
  61.     </property> 
  62.                                                                                                          
  63.     <property> 
  64.         <name>yarn.nodemanager.log.retain-seconds</name> 
  65.         <value>604800</value> 
  66.     </property> 
  67.                                                                                                          
  68.     <property> 
  69.         <name>yarn.nodemanager.resource.cpu-vcores</name> 
  70.         <value>16</value> 
  71.     </property> 
  72.                                                                                                          
  73.     <property> 
  74.         <name>yarn.nodemanager.resource.memory-mb</name> 
  75.         <value>50320</value> 
  76.     </property> 
  77.                                                                                                          
  78.     <property> 
  79.         <name>yarn.scheduler.minimum-allocation-mb</name> 
  80.         <value>256</value> 
  81.     </property> 
  82.                                                                                                          
  83.     <property> 
  84.         <name>yarn.scheduler.maximum-allocation-mb</name> 
  85.         <value>40960</value> 
  86.     </property> 
  87.                                                                                                          
  88.     <property> 
  89.         <name>yarn.scheduler.minimum-allocation-vcores</name> 
  90.         <value>1</value> 
  91.     </property> 
  92.                                                                                                          
  93.     <property> 
  94.         <name>yarn.scheduler.maximum-allocation-vcores</name> 
  95.         <value>8</value> 
  96.     </property> 
  97. </configuration> 

【注意:上面的第68`96行部分,需要根据服务器的硬件配置进行修改】

6、配置hadoop-env.shmapred-env.shyarn-env.sh【在开头添加】

文件路径:

 

  1. /usr/local/hadoop/etc/hadoop/hadoop-env.sh 
  2. /usr/local/hadoop/etc/hadoop/mapred-env.sh 
  3. /usr/local/hadoop/etc/hadoop/yarn-env.sh 

 

添加内容:

 

  1. export JAVA_HOME=/usr/local/jdk 
  2. export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
  3.   
  4. export HADOOP_HOME=/usr/local/hadoop 
  5. export HADOOP_PID_DIR=/data/hadoop/pids 
  6. export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
  7. export HADOOP_OPTS="$HADOOP_OPTS-Djava.library.path=$HADOOP_HOME/lib/native" 
  8.   
  9. export HADOOP_PREFIX=$HADOOP_HOME 
  10.   
  11. export HADOOP_MAPRED_HOME=$HADOOP_HOME 
  12. export HADOOP_COMMON_HOME=$HADOOP_HOME 
  13. export HADOOP_HDFS_HOME=$HADOOP_HOME 
  14. export YARN_HOME=$HADOOP_HOME 
  15.   
  16. export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop 
  17. export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop 
  18. export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop 
  19.   
  20. export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native 
  21.   
  22. export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 

 

7、数据节点配置

 

  1. # vim /usr/local/hadoop/etc/hadoop/slaves 
  2. datanode1 
  3. datanode2 

8、集群启动

(1)、在namenode1上执行,创建命名空间

# hdfs zkfc -formatZK

(2)、在对应的节点上启动日志程序journalnode

# cd /usr/local/hadoop && ./sbin/hadoop-daemon.sh start journalnode

wKioL1MIwEei5dweAAB3R1vMEIc859.jpg

(3)、格式化主NameNode节点(namenode1

# hdfs namenode -format

(4)、启动主NameNode节点

# cd /usr/local/hadoop && sbin/hadoop-daemon.sh start namenode

wKiom1MIwK6TLVgJAACBbcEggKA655.jpg

(5)、格式备NameNode节点(namenode2

# hdfs namenode -bootstrapStandby

 

(6)、启动备NameNode节点(namenode2

# cd /usr/local/hadoop && sbin/hadoop-daemon.sh start namenode

 

----------------------------------------------------------------------------------------------------------------------------------------------

(7)、在两个NameNode节点(namenode1namenode2)上执行

# cd /usr/local/hadoop && sbin/hadoop-daemon.shstart zkfc

wKiom1MIwODwBevDAACjb7im8qQ180.jpg

 

----------------------------------------------------------------------------------------------------------------------------------------------

(8)、启动所有的DataNode节点(datanode1datanode2

# cd /usr/local/hadoop && sbin/hadoop-daemon.sh start datanode

wKioL1MIwP6SG-KiAACIBsIvTIk271.jpg

 

----------------------------------------------------------------------------------------------------------------------------------------------

(9)、启动Yarnnamenode1

# cd /usr/local/hadoop && sbin/start-yarn.sh

wKiom1MIwWDjmObIAACgTp2DRFg603.jpg

 

NameNode节点上的信息:

wKiom1MIwYqhl34YAACiqH9LRl4983.jpg

 

DataNode节点上的信息:

wKioL1MIwZSRTJUAAAB8zD6pmrw648.jpg

 

----------------------------------------------------------------------------------------------------------------------------------------------

(10)、测试Yarn是否可用

# hdfs dfs -put/usr/local/hadoop/etc/hadoop/yarn-site.xml /tmp

# hadoop jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jarwordcount /tmp/yarn-site.xml /mytest

 

-----------------------------------------------------------------------------------------------------------------------------------------------

(11)HDFSHA功能测试

切换前的状态:

wKioL1MIwdDiElH9AAFrF2rdxYA952.jpg

wKiom1MIwi2RGtMIAAGc5mTKnYM768.jpg

wKioL1MIwi7hpwWaAACb0lz9mWU004.jpg

 

# kill -9 11466

wKiom1MIwnqT92qZAACF8sBpT3g016.jpg

 

# cd /usr/local/hadoop && sbin/hadoop-daemon.sh start namenode

 

切换后的状态:

wKiom1MIwp2y8lv6AAGGDB93OLk752.jpg

wKioL1MIwpmwp7H3AAGYbtXwEmQ578.jpg

-----------------------------------------------------------------------------------------------------------------------------------------------

(12)、后续维护

HDFS的关闭与启动:

# cd /usr/local/hadoop && sbin/stop-dfs.sh

# cd /usr/local/hadoop && sbin/start-dfs.sh

 

YARN的关闭与启动:

# cd /usr/local/hadoop && sbin/stop-yarn.sh

# cd /usr/local/hadoop && sbin/start-yarn.sh

【注意】

需要在NameNode节点上执行。

五、Spark集群部署

1、Spark安装与配置

Spark的源码编译请参考:

http://sofar.blog.51cto.com/353572/1358457

 

# tar xvzf spark-0.9.0-incubating.tgz -C/usr/local

# cd /usr/local

# ln -s spark-0.9.0-incubating spark

 

# vim /etc/profile

export SPARK_HOME=/usr/local/spark

export PATH=$PATH:$SPARK_HOME/bin

 

# source /etc/profile

 

# cd /usr/local/spark/conf

# mkdir -p /data/spark/tmp

 

----------------------------------------------------------------------------------------------------------------------------------------------

# vim spark-env.sh

export JAVA_HOME=/usr/local/jdk

export SCALA_HOME=/usr/local/scala

export HADOOP_HOME=/usr/local/hadoop

 

SPARK_LOCAL_DIR="/data/spark/tmp"

 

SPARK_JAVA_OPTS="-Dspark.storage.blockManagerHeartBeatMs=60000-Dspark.local.dir=$SPARK_LOCAL_DIR -XX:+PrintGCDetails -XX:+PrintGCTi

meStamps -Xloggc:$SPARK_HOME/logs/gc.log -XX:+UseConcMarkSweepGC-XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFracti

on=60"

 

----------------------------------------------------------------------------------------------------------------------------------------------

# vim slaves

datanode1

datanode2

 

# cd /usr/local/spark && sbin/start-all.sh

 

=========================================================================================

2、相关测试

(1)、本地模式

# bin/run-exampleorg.apache.spark.examples.SparkPi local

 

----------------------------------------------------------------------------------------------------------------------------------------------

(2)、普通集群模式

# bin/run-exampleorg.apache.spark.examples.SparkPi spark://namenode1:7077

# bin/run-exampleorg.apache.spark.examples.SparkLR spark://namenode1:7077

# bin/run-exampleorg.apache.spark.examples.SparkKMeans spark://namenode1:7077file:/usr/local/spark/data/kmeans_data.txt 2 1

 

----------------------------------------------------------------------------------------------------------------------------------------------

(3)、结合HDFS的集群模式

# hadoop fs -put README.md .

# MASTER=spark://namenode1:7077bin/spark-shell

scala> val file =sc.textFile("hdfs://namenode1:9000/user/root/README.md")

scala> val count = file.flatMap(line=> line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)

scala> count.collect()

scala> :quit

 

----------------------------------------------------------------------------------------------------------------------------------------------

(4)、基于YARN模式

#SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar\

bin/spark-classorg.apache.spark.deploy.yarn.Client \

--jarexamples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar \

--classorg.apache.spark.examples.SparkPi \

--args yarn-standalone \

--num-workers 3 \

--master-memory 4g \

--worker-memory 2g \

--worker-cores 1

 

----------------------------------------------------------------------------------------------------------------------------------------------

(5)、最终的目录结构及相关配置

目录结构:

wKioL1MIw0Dzp_dHAAMdxJzQ7Lg319.jpg

 

配置文件/etc/profile中的环境变量设置:

 

wKiom1MIw5Si87QkAAKdKXz4xxE592.jpg

 

 

责任编辑:彭凡 来源: 51CTO博客
相关推荐

2010-06-03 15:25:31

Hadoop Hdfs

2023-05-08 23:20:49

WebYARN管理

2012-12-03 16:57:37

HDFS

2015-03-11 09:24:54

RIP动态路由协议loopback

2014-04-16 13:47:43

SparkYarn

2009-12-07 10:50:01

Cisco路由器配置

2021-08-31 23:09:27

Spark资源分配

2023-08-11 08:49:49

2010-09-01 09:38:29

设置DHCP服务器

2009-07-17 13:54:51

JDBC存储过程

2011-03-23 10:51:35

Finereport报表制作

2010-06-03 15:13:34

Hadoop Hdfs

2017-06-08 11:00:09

HDFSHadoopYARN

2020-01-09 10:47:15

HDFS数据文件

2015-07-23 13:43:43

vSphereHA虚拟化

2021-06-01 08:08:47

Harbor Traefik 开源

2010-01-07 13:10:25

交换机配置

2009-12-21 09:39:50

Oracle 存储过程

2009-10-16 09:45:41

Linux内核操作系统

2010-12-10 14:24:02

JSPServlet
点赞
收藏

51CTO技术栈公众号