I have a python package with many modules built into an .egg file and I want to use it inside a zeppelin laptop. According to the zeppelin documentation, to transfer this package to the zeppelin spark interpreter, you can export it via the --files option in SPARK_SUBMIT_OPTIONS in conf / zeppelin-env.sh. I have the following questions regarding this:
In the pyspark shell, the .egg file specified using -py files works (i.e. I can import the module inside the package inside the pyspark shell), while the same .egg file with the -files option does not work (ImportError: There is no module with named XX.xx)
Adding the .egg file via the -py-files option to SPARK_SUBMIT_OPTIONS in zeppelin causes an error: Error: --py-files given but primary resource is not a Python script. According to my understanding, everything that is passed to SPARK_SUBMIT_OPTIONS is passed to the spark-submit command, but why does the -py file error throw an error?
When I add .egg through the --files option to SPARK_SUBMIT_OPTIONS, the zeppelin laptop does not throw an error, but I cannot import the module inside the zeppelin laptop.
What is the correct way to transfer a .egg zeppelin spark intrepreter file?
Sparks version 1.6.2 and Zeppelin version 0.6.0
The zepplein-env.sh file contains the following:
export SPARK_HOME=/home/me/spark-1.6.1-bin-hadoop2.6 export SPARK_SUBMIT_OPTIONS="--jars /home/me/spark-csv-1.5.0-s_2.10.jar,/home/me/commons-csv-1.4.jar --files /home/me/models/Churn-zeppelin/package/build/dist/fly_libs-1.1-py2.7.egg"
source share