Why doesn't spark-submit in cluster mode YARN find python packages for executors?

I am running a boo.pyscript in AWS EMR using spark-submit(Spark 2.0).

File completed successfully when I use

python boo.py

However, when executed

spark-submit --verbose --deploy-mode cluster --master yarn  boo.py

The log yarn logs -applicationId ID_numberdisplays:

Traceback (most recent call last):
File "boo.py", line 17, in <module>
import boto3
ImportError: No module named boto3

I use pythonand boto3module

$ which python
/usr/bin/python
$ pip install boto3
Requirement already satisfied (use --upgrade to upgrade): boto3 in /usr/local/lib/python2.7/site-packages

How to add this path to the library so spark-submitthat the module can read boto3?

+4
source share
1 answer

When you use a spark, part of the code runs in the driver, and part is executed on the performers.

Did you install boto3 only in the driver or on the driver + all the executors (nodes) that can run your code?

: boto3 ()

python EMR Amazon:

Python Amazon EMR?

+3

Source: https://habr.com/ru/post/1654534/


All Articles