Why aren't -py files supported in zeppelin?

I have a python package with many modules built into an .egg file and I want to use it inside a zeppelin laptop. According to the zeppelin documentation, to transfer this package to the zeppelin spark interpreter, you can export it via the --files option in SPARK_SUBMIT_OPTIONS in conf / zeppelin-env.sh. I have the following questions regarding this:

  • In the pyspark shell, the .egg file specified using -py files works (i.e. I can import the module inside the package inside the pyspark shell), while the same .egg file with the -files option does not work (ImportError: There is no module with named XX.xx)

  • Adding the .egg file via the -py-files option to SPARK_SUBMIT_OPTIONS in zeppelin causes an error: Error: --py-files given but primary resource is not a Python script. According to my understanding, everything that is passed to SPARK_SUBMIT_OPTIONS is passed to the spark-submit command, but why does the -py file error throw an error?

  • When I add .egg through the --files option to SPARK_SUBMIT_OPTIONS, the zeppelin laptop does not throw an error, but I cannot import the module inside the zeppelin laptop.

What is the correct way to transfer a .egg zeppelin spark intrepreter file?

Sparks version 1.6.2 and Zeppelin version 0.6.0

The zepplein-env.sh file contains the following:

 export SPARK_HOME=/home/me/spark-1.6.1-bin-hadoop2.6 export SPARK_SUBMIT_OPTIONS="--jars /home/me/spark-csv-1.5.0-s_2.10.jar,/home/me/commons-csv-1.4.jar --files /home/me/models/Churn-zeppelin/package/build/dist/fly_libs-1.1-py2.7.egg" 
+5
source share

Source: https://habr.com/ru/post/1263617/


All Articles