apache spark版本2.4.4,HDP版本3.0.1.0-187

 

首先使用hadoop3编译spark,

./dev/make-distribution.sh --pip --tgz -Phadoop-3.1 -Phive -Phive-thriftserver -Pyarn -Pkubernetes

 

编译完成后,将安装包放在HDP集群中。为这个spark单独拷贝一份hadoop配置文件,并将yarn-site.xml中的yarn.timeline-service.enabled设置为false。否则会报NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig

因为spark2.4.4和yarn都依赖了jersey-client,但依赖的版本不一样,spark/jars下的为jersey-client-2.22.2.jar,在hadoop-yarn/lib下为jersey-client-1.19.jar。当timeline-service.enabled为true时,会走yarn api中的某一段代码逻辑,该片段涉及到了jersey-client的某些类,这些类(方法)在jersey-client-2.22.2.jar已经不存在了,所以报错。

而HDP的spark2不会出现这个错误,是因为在hdp的spark的代码里,特别设置了yarn.timeline-service.enabled为false,具体可参考:https://www.jianshu.com/p/460f98111d43

 

启动spark-on-yarn:报错ShimLoder.getMajorVersion: Unrecognized Hadoop major version number: 3.1.0。

原因是spark2集成的hive-exec版本为hive-exec-1.2.1.spark2.jar,该版本不支持hadoop3,所有报错。

解决1:使用HDP中的hive-exec-1.21.2.3.0.1.0-187.jar替换调hive-exec-1.2.1.spark2.jar,以上问题解决。

解决2:spark不使用hadoop3及以上版本编译,并将spark-defaults.conf中配置改为spark.sql.hive.metastore.jars builtin

spark.sql.hive.metastore.version 1.2.1

 

启动spark-on-yarn:driver几分钟内都在打印日志Application report for application_1588126420266_0729 (state: ACCEPTED),并最终报错Exception message: /data/hadoop/yarn/local/usercache/ocsp/appcache/application_1519982778829_0171/container_e37_1519982778829_0171_02_000001/launch_container.sh: line 32: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/__spark_conf__/__hadoop_conf__: bad substitution

在spark-default.conf配置文件中添加: spark.driver.extraJavaOptions -Dhdp.version=3.1.0 spark.yarn.am.extraJavaOptions -Dhdp.version=3.1.0,问题解决。

此问题可参考https://www.jianshu.com/p/de762c244663

注意:spark不能引用hdp集群的mapred-site.xml,否则上面的问题即使加配置也无法解决。

 

 

以上为启动spark on yarn静态资源遇到的问题,解决后程序正常启动。

 

接下来spark on yarn动态资源。

首先HDP的sparkshuffleservice的默认端口为7447,而不是apache spark的7337,所以需要在spark-defalut中配置端口为7447。并不设置初始化executor个数,默认为0.

然后启动spark on yarn动态资源,启动成功,目前只存在一个AppMaster。

然后发生任意sql执行,可以看到控制台输出

ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged

并一直输出以下日志

 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

查看appMaster中的日志,发现如下错误

INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.

ERROR YarnAllocator: Failed to launch executor 1 on container container_e55_1588126420266_0777_01_000002

org.apache.spark.SparkException: Exception while starting container container_e55_1588126420266_0777_01_000002 on host online-slave-6

        at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:126)

        at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65)

        at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$2.run(YarnAllocator.scala:546)

        at

        at

        at

Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

可以看出因为找不到sparkshuffleservice导致无法创建executor。

查看yarn的配置,配置的sparkshuffleservice的name为spark2_shuffle,这是因为HDP还集成了spark1x,spark1x的sparkshuffleservice的name为spark_shuffle。

但是apache spark源码是写死的“spark_shuffle”,且不支持设置shuffle service name,所以增加配置项spark.shuffle.service.name,并配置成spark2_shuffle,问题解决。如果不想改apache spark源码,就只能改yarn配置并重启yarn。

最后修改于
上一篇