Unable to load main class from JAR file in Spark Submit

I am trying to get Spark running. This is my shell script, which is located at / home / full / path / to / file / shell / my _shell_script.sh:

confLocation=../conf/my_config_file.conf && executors=8 && memory=2G && entry_function=my_function_in_python && dos2unix $confLocation && spark-submit \ --master yarn-client \ --num-executors $executors \ --executor-memory $memory \ --py-files /home/full/path/to/file/python/my_python_file.py $entry_function $confLocation 

When I ran this, I get an error message:

Error: unable to load main class from JAR file: / home / full / path / to / file / shell / my_function_in_python

My impression here is that it is looking in the wrong place (the python file is in the python directory and not in the shell directory).

+6
source share
3 answers

What worked for me was to simply transfer the python files without the command --py-files . Looks like that:

 confLocation=../conf/my_config_file.conf && executors=8 && memory=2G && entry_function=my_function_in_python && dos2unix $confLocation && spark-submit \ --master yarn-client \ --num-executors $executors \ --executor-memory $memory \ /home/full/path/to/file/python/my_python_file.py $entry_function $confLocation 
0
source

The --py-files flag is for additional dependencies of python files used in your program; you can see here at SparkSubmit.scala , it uses the so-called “primary argument”, which means the first non-flag argument, to determine whether to send jarfile or send python native mode.

This is why you are trying to load your "$ entry_function" as a jar file that does not exist, since it assumes that you are using Python if this primary argument ends with ".py", and otherwise it assumes that you have there is a .jar file.

Instead of using --py-files just make your /home/full/path/to/file/python/my_python_file.py main argument; then you can either pretend to be python to take the "input function" as an argument to the program, or simply call the input function in your main function inside the python file itself.

Alternatively, you can use --py-files , and then create a new main .py file that calls your input function, and then pass that main .py file as the main argument.

+5
source

When adding items to --py files, use a comma to separate them without leaving space. Try it:

 confLocation=../conf/my_config_file.conf && executors=8 && memory=2G && entry_function=my_function_in_python && dos2unix $confLocation && spark-submit \ --master yarn-client \ --num-executors $executors \ --executor-memory $memory \ --py-files /home/full/path/to/file/python/my_python_file.py,$entry_function,$confLocation 
0
source

Source: https://habr.com/ru/post/1237873/


All Articles