Writing and running pyspark in IntelliJ IDEA

Question

Writing and running pyspark in IntelliJ IDEA

I am trying to work with Pyspark in IntelliJ, but I cannot figure out how to properly install / configure it. I can work with Python in IntelliJ, and I can use the pyspark shell, but I cannot tell IntelliJ how to find the Spark files (import the pyspark results into "ImportError: No module pyspark").

Any tips on how to enable / import the spark are welcome so IntelliJ can work with it.

Thanks.

UPDATE:

I tried this piece of code:

from pyspark import SparkContext, SparkConf spark_conf = SparkConf().setAppName("scavenge some logs") spark_context = SparkContext(conf=spark_conf) address = "C:\test.txt" log = spark_context.textFile(address) my_result = log.filter(lambda x: 'foo' in x).saveAsTextFile('C:\my_result')

with the following error messages:

 Traceback (most recent call last): File "C:/Users/U546816/IdeaProjects/sparktestC/.idea/sparktestfile", line 2, in <module> spark_conf = SparkConf().setAppName("scavenge some logs") File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\conf.py", line 97, in __init__ File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\context.py", line 221, in _ensure_initialized File "C:\Users\U546816\Documents\Spark\lib\spark-assembly-1.3.1-hadoop2.4.0.jar\pyspark\java_gateway.py", line 35, in launch_gateway File "C:\Python27\lib\os.py", line 425, in __getitem__ return self.data[key.upper()] KeyError: 'SPARK_HOME' Process finished with exit code 1

+5

python intellij-idea apache-spark pyspark

tandy Nov 02 '15 at 13:01

source share

2 answers

For example, something like that:

 from pyspark import SparkContext, SparkConf spark_conf = SparkConf().setAppName("scavenge some logs") spark_context = SparkContext(conf=spark_conf) address = "/path/to/the/log/on/hdfs/*.gz" log = spark_context.textFile(address) my_result = (log. ...here go your actions and transformations... ).saveAsTextFile('my_result')

+1

kurtosis Nov 02 '15 at 13:45

source share

Boubountu · Accepted Answer · 2015-12-14T22:16:12+0000

Set the env path for ( SPARK_HOME and PYTHONPATH ) in your run / debug configuration program.

For instance:

 SPARK_HOME=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/ PYTHON_PATH=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/pyspark

See attached snapshot in IntelliJ Idea

Writing and running pyspark in IntelliJ IDEA

More articles: