How to manually deploy a third-party jar utility for an Apache Spark cluster?

Question

How to manually deploy a third-party jar utility for an Apache Spark cluster?

I have an Apache Spark cluster (multi-nodes) and I would like to manually deploy some utilities for each Spark node. Where should I put these cans? For example: spark-streaming-twitter_2.10-1.6.0.jar

I know that we can use maven to create a thick jar that includes these jars, however I would like to deploy these utilities manually. Thus, programmers do not have to deploy these utilities.

Any suggestion?

+4

apache-spark

ErhWen Kuo Jan 27 '16 at 8:18

source share

2 answers

Shawn Guo · Answer 1 · 2016-01-28T00:59:00+0000

1, Copy the third-party banks to the reserved HDFS directory.
eghdfs://xxx-ns/user/xxx/3rd-jars/

2, In spark-submit, , hdfs.
hdfs: - JAR hdfs

--jars hdfs://xxx-ns/user/xxx/3rd-jars/xxx.jar

3, spark-submit

Client: Source and destination file systems are the same. Not copying hdfs://xxx-ns/user/xxx/3rd-jars/xxx.jar

Alex Naspo · Answer 2 · 2016-01-27T14:45:33+0000

spark-submit --jars. . spark-submit --help for -jars

  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.

.

Or, to also add code.jar to its classpath, use:
$ ./bin/spark-shell --master local[4] --jars code.jar

How to manually deploy a third-party jar utility for an Apache Spark cluster?

More articles: