This is a limitation of Apache Spark itself, not Spark on EMR. When Spark starts in client deployment mode (all interactive shells, such as spark-shell
or pyspark
, or spark-submit
without --deploy-mode cluster
or --master yarn-cluster
), only local jar paths are allowed.
The reason for this is that in order for Spark to download this remote jar, it must already run Java code, and at this point it is too late to add the jar to its own classpath.
The workaround is to download the jar locally (using the AWS S3 CLI), then specify the local path when starting the spark shell or spark-submit.
source share