How to manually deploy a third-party jar utility for an Apache Spark cluster?

I have an Apache Spark cluster (multi-nodes) and I would like to manually deploy some utilities for each Spark node. Where should I put these cans? For example: spark-streaming-twitter_2.10-1.6.0.jar

I know that we can use maven to create a thick jar that includes these jars, however I would like to deploy these utilities manually. Thus, programmers do not have to deploy these utilities.

Any suggestion?

+4
source share
2 answers

1, Copy the third-party banks to the reserved HDFS directory.
eghdfs://xxx-ns/user/xxx/3rd-jars/

2, In spark-submit, , hdfs.
hdfs: - JAR hdfs

--jars hdfs://xxx-ns/user/xxx/3rd-jars/xxx.jar  

3, spark-submit

Client: Source and destination file systems are the same. Not copying hdfs://xxx-ns/user/xxx/3rd-jars/xxx.jar
+1

spark-submit --jars. . spark-submit --help for -jars

  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.

.

Or, to also add code.jar to its classpath, use:
$ ./bin/spark-shell --master local[4] --jars code.jar
0

Source: https://habr.com/ru/post/1626121/


All Articles