Pyspark import.py file not working

My goal is to import my own .py file into my spark application and call some of the functions included in this file

Here is what I tried:

I have a test file called Test.py that looks like this:

def func(): print "Import is working" 

Inside the Spark application, I do the following (as described in the docs):

 sc = SparkContext(conf=conf, pyFiles=['/[AbsolutePathTo]/Test.py']) 

I also tried this instead (after creating the Spark context):

 sc.addFile("/[AbsolutePathTo]/Test.py") 

I even tried the following when submitting my spark application:

 ./bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --py-files /[AbsolutePath]/Test.py ../Main/Code/app.py 

However, I always get a name error:

 NameError: name 'func' is not defined 

when i call func () inside app.py. (same error with "Test" if I try to call Test.func () )

Finally, al also tried to import the file inside the pyspark shell using the same command as above:

 sc.addFile("/[AbsolutePathTo]/Test.py") 

Strange, I do not get an error while importing, but still I can not call func () without getting an error. Also, not sure if that matters, but I use spark locally on the same machine.

I really tried everything I could think of, but still I can't get it to work. Probably I am missing something very simple. Any help would be appreciated.

+5
source share
1 answer

Well, actually my question is pretty dumb. After execution:

 sc.addFile("/[AbsolutePathTo]/Test.py") 

I still need to import the Test.py file, as if I were importing a regular python file using

 import Test 

then i can call

 Test.func() 

and it works. I thought that an “import test” is not needed, as I am adding the file to the spark context, but apparently it does not have the same effect. Thanks to mark91 for pointing me in the right direction.

UPDATE 10/28/2017:

as pointed out in the comments, here is more detailed information about app.py

 from pyspark import SparkContext from pyspark.conf import SparkConf conf = SparkConf() conf.setMaster("local[4]") conf.setAppName("Spark Stream") sc = SparkContext(conf=conf) sc.addFile("Test.py") import Test Test.func() 
+11
source

Source: https://habr.com/ru/post/1238851/


All Articles