EMR Spark Sheath Does Not Pick Up Cans

I am using spark-shell and I cannot select external jars . I am running spark in EMR.

I run the following command:

 spark-shell --jars s3://play/emr/release/1.0/code.jar 

I get the following error:

OpenJDK 64-bit VM server warning: ignoring the option MaxPermSize = 512M; support was removed in 8.0 Warning: Skip remote jar s3: //play/emr/release/1.0/code.jar

Thanks in advance.

+5
source share
3 answers

This is a limitation of Apache Spark itself, not Spark on EMR. When Spark starts in client deployment mode (all interactive shells, such as spark-shell or pyspark , or spark-submit without --deploy-mode cluster or --master yarn-cluster ), only local jar paths are allowed.

The reason for this is that in order for Spark to download this remote jar, it must already run Java code, and at this point it is too late to add the jar to its own classpath.

The workaround is to download the jar locally (using the AWS S3 CLI), then specify the local path when starting the spark shell or spark-submit.

+3
source

You can do this using the spark-shell command line in the EMR field itself:

spark-submit --verbose --deploy-mode cluster --class com.your.package.and.Class s3: //bucket/path/to/thejar.jar 10

You can also invoke this command using the AWS Java EMR client library or AWS CLI. The key should use: '--deploy-mode cluster'

+2
source

If you had the same problem, you can add argm -master yarn --deploy-mode cluster and it will allow you to remotely run s3-jars

0
source

Source: https://habr.com/ru/post/1243881/


All Articles