My goal is to import my own .py file into my spark application and call some of the functions included in this file
Here is what I tried:
I have a test file called Test.py that looks like this:
def func(): print "Import is working"
Inside the Spark application, I do the following (as described in the docs):
sc = SparkContext(conf=conf, pyFiles=['/[AbsolutePathTo]/Test.py'])
I also tried this instead (after creating the Spark context):
sc.addFile("/[AbsolutePathTo]/Test.py")
I even tried the following when submitting my spark application:
./bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --py-files /[AbsolutePath]/Test.py ../Main/Code/app.py
However, I always get a name error:
NameError: name 'func' is not defined
when i call func () inside app.py. (same error with "Test" if I try to call Test.func () )
Finally, al also tried to import the file inside the pyspark shell using the same command as above:
sc.addFile("/[AbsolutePathTo]/Test.py")
Strange, I do not get an error while importing, but still I can not call func () without getting an error. Also, not sure if that matters, but I use spark locally on the same machine.
I really tried everything I could think of, but still I can't get it to work. Probably I am missing something very simple. Any help would be appreciated.