Goal. I am trying to get pyspark apache-spark to be interpreted correctly in my pycharm development environment.
Problem: I am currently getting the following error:
ImportError: cannot import name accumulators
I followed the next blog to help me in this process. http://renien.imtqy.com/blog/accessing-pyspark-pycharm/
Due to the fact that my code took the exclusive path, I personally got rid of the attempt: with the exception of: just to find out what the exact error is.
Before that I got the following error:
ImportError: No module named py4j.java_gateway
This was fixed simply by entering '$ sudo pip install py4j' in bash.
Currently my code is as follows:
import os import sys # Path for spark source folder os.environ['SPARK_HOME']="[MY_HOME_DIR]/spark-1.2.0" # Append pyspark to Python Path sys.path.append("[MY_HOME_DIR]/spark-1.2.0/python/") try: from pyspark import SparkContext print ("Successfully imported Spark Modules") except ImportError as e: print ("Can not import Spark Modules", e) sys.exit(1)
My questions:
1. What is the cause of this error? What is the reason? 2. How to fix the problem so that I can run pyspark in my pycharm editor.
NOTE. The current interpreter I'm using in pycharm is Python 2.7.8 (~ / anaconda / bin / python)
Thanks in advance!
Don
source share