Oracle 系统管理 - Linux 系统 - Backtrack 5 - 安全 - Juniper 技术 - Cisco 技术 - 思科模拟器 - Cisco 认证 - Cisco ios 下载

您现在的位置是:Docker > 云服务器 > spark on yarn 开发(begin)

spark on yarn 开发(begin)

时间:2018-06-04 13:54  来源:未知  阅读次数: 复制分享 我要评论

这个是根据 董西成老师的 博客实验,然后自己写了一遍,中间遇到一些问题,索性记录下来。

其实是个很简单的 wordcount类,不过有了这些类,其他的代码,往里面慢慢填就行了。

package org.apache.spark
import org.apache.spark._
import SparkContext._


object WordCount {
  ///apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar
    def main(args: Array[String]) {
        if (args.length != 2 ){
          println("usage is org.test.WordCount <input> <output>")
          return
        }
        val sparkConf = new SparkConf().setAppName("WordCount")
        val sc = new SparkContext(sparkConf)
        val textFile = sc.textFile(args(0))
        val result = textFile.flatMap(line => line.split("\\s+")).map(word => (word, 1)).reduceByKey(_ + _)
        result.saveAsTextFile(args(1))
      }
}

 

shell 文件为:

export YARN_CONF_DIR=/etc/hadoop/conf
SPARK_JAR=/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar \
/apps/spark-1.2.0-bin-hadoop2.4/bin/spark-class org.apache.spark.deploy.yarn.Client \
--jar ./RELEASE/spark-test-wordcount.jar \
--class org.apache.spark.WordCount \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/word.txt \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/output \
--num-workers 1 \
--master-memory 2g \
--worker-memory 2g \
--worker-cores 2
~

话不多少,就这些。跑完日志为:

[root@UHVDATA016 yangjingbo]# ./wordcount.sh 
Spark assembly has been built with Hive, including Datanucleus jars on classpath
WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with "--master yarn"
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
15/01/06 13:27:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/06 13:27:33 INFO yarn.Client: Requesting a new application from cluster with 7 NodeManagers
15/01/06 13:27:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (13824 MB per container)
15/01/06 13:27:33 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/06 13:27:33 INFO yarn.Client: Setting up container launch context for our AM
15/01/06 13:27:33 INFO yarn.Client: Preparing resources for our AM container
15/01/06 13:27:33 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/01/06 13:27:33 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:33 INFO yarn.Client: Uploading resource file:/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar -> 
15/01/06 13:27:35 INFO yarn.Client: Uploading resource file:/home/yangjingbo/RELEASE/spark-test-wordcount.jar -> 
15/01/06 13:27:35 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/06 13:27:35 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:35 INFO spark.SecurityManager: Changing view acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: Changing modify acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/01/06 13:27:35 INFO yarn.Client: Submitting application 29 to ResourceManager
15/01/06 13:27:35 INFO impl.YarnClientImpl: Submitted application application_1416218486128_0029
15/01/06 13:27:36 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:36 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1420522055830
         final status: UNDEFINED
         user: root
15/01/06 13:27:37 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:38 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:39 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:40 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:41 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:42 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:42 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: ###########
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1420522055830
         final status: UNDEFINED
         user: root
15/01/06 13:27:43 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:44 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:45 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:46 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:47 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:48 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:49 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:50 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:51 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:52 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:53 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:54 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:55 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:56 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:57 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:58 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:59 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:28:00 INFO yarn.Client: Application report for application_1416218486128_0029 (state: FINISHED)
15/01/06 13:28:00 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: ##############
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1420522055830
         final status: SUCCEEDED
         user: root
./wordcount.sh: line 9: --num-workers: command not found
[root@UHVDATA016 yangjingbo]# hadoop dfs -ls /yang
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


Found 2 items
drwxr-xr-x   - root hdfs          0 2015-01-06 13:28 /yang/output
-rw-r--r--   3 root hdfs         93 2015-01-06 11:38 /yang/word.txt
[root@UHVDATA016 yangjingbo]# hadoop dfs -cat /yang/output
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


cat: `/yang/output': Is a directory
[root@UHVDATA016 yangjingbo]# hadoop dfs -ls /yang/output/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


Found 3 items
-rw-r--r--   3 root hdfs          0 2015-01-06 13:28 /yang/output/_SUCCESS
-rw-r--r--   3 root hdfs          0 2015-01-06 13:28 /yang/output/part-00000
-rw-r--r--   3 root hdfs         49 2015-01-06 13:28 /yang/output/part-00001
[root@UHVDATA016 yangjingbo]# hadoop dfs -cat /yang/output/part-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


[root@UHVDATA016 yangjingbo]# hadoop dfs -cat /yang/output/part-00001
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


(name,2)
(hadoop,2)
(hdfs,3)
(redis,6)
(hbase,2)
相关资讯