Pyspark ImportError: Unable to import namestores

Goal. I am trying to get pyspark apache-spark to be interpreted correctly in my pycharm development environment.

Problem: I am currently getting the following error:

ImportError: cannot import name accumulators 

I followed the next blog to help me in this process. http://renien.imtqy.com/blog/accessing-pyspark-pycharm/

Due to the fact that my code took the exclusive path, I personally got rid of the attempt: with the exception of: just to find out what the exact error is.

Before that I got the following error:

 ImportError: No module named py4j.java_gateway 

This was fixed simply by entering '$ sudo pip install py4j' in bash.

Currently my code is as follows:

 import os import sys # Path for spark source folder os.environ['SPARK_HOME']="[MY_HOME_DIR]/spark-1.2.0" # Append pyspark to Python Path sys.path.append("[MY_HOME_DIR]/spark-1.2.0/python/") try: from pyspark import SparkContext print ("Successfully imported Spark Modules") except ImportError as e: print ("Can not import Spark Modules", e) sys.exit(1) 

My questions:
1. What is the cause of this error? What is the reason? 2. How to fix the problem so that I can run pyspark in my pycharm editor.

NOTE. The current interpreter I'm using in pycharm is Python 2.7.8 (~ / anaconda / bin / python)

Thanks in advance!

Don

+5
source share
11 answers

First, set up your var environment

 export SPARK_HOME=/home/.../Spark/spark-2.0.1-bin-hadoop2.7 export PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.10.3-src.zip:$PYTHONPATH PATH="$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$PYTHONPATH" 

make sure you use your own version name

and then restart! It is important to verify the installation.

+1
source

It is located around the PYTHONPATH variable, which defines the search path of the python module.

The reason is that pyspark works well, you can access the shell script pyspark, and the PYTHONPATH parameter looks like the one shown below.

PYTHONPATH = / USR / Library / spark / Python / Library / py4j-0.8.2.1-src.zip.: / USR / Library / spark / Python

My environment is Cloudera Qickstart VM 5.3.

Hope this helps.

+7
source

It looks like a circular dependency error.

In MY_HOME_DIR]/spark-1.2.0/python/pyspark/context.py delete or comment out the line

from pyspark import accumulators .

These are approximately 6 lines of code above.

Here I asked a problem with the Spark project:

https://issues.apache.org/jira/browse/SPARK-4974

+4
source

I came across the same error. I just installed py4j.

 sudo pip install py4j 

No need to install bashrc.

+2
source

I ran into the same problem using cdh 5.3

in the end, it turned out to be pretty easy to solve. I noticed that script / usr / lib / spark / bin / pyspark has variables defined for ipython

I installed anaconda in / opt / anaconda

 export PATH=/opt/anaconda/bin:$PATH #note that the default port 8888 is already in use so I used a different port export IPYTHON_OPTS="notebook --notebook-dir=/home/cloudera/ipython-notebook --pylab inline --ip=* --port=9999" 

then finally ....

completed

 /usr/bin/pyspark 

which is now functioning properly.

+1
source

I ran into this problem. To solve this problem, I commented out line 28 in ~/spark/spark/python/pyspark/context.py , the file that caused the error:

 # from pyspark import accumulators from pyspark.accumulators import Accumulator 

Since importing the battery seems to be covered by the next line (29), it seems that the problem does not arise. The spark is working fine now (after pip install py4j ).

+1
source

In Pycharm, before running on a script, make sure you unzip the py4j * .zip file. and add its link to the sys.path.append script ("spark path * / python / lib")

It worked for me.

+1
source
 To get rid of **ImportError: No module named py4j.java_gateway** you need to add following lines import os import sys os.environ['SPARK_HOME'] = "D:\python\spark-1.4.1-bin-hadoop2.4" sys.path.append("D:\python\spark-1.4.1-bin-hadoop2.4\python") sys.path.append("D:\python\spark-1.4.1-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip") try: from pyspark import SparkContext from pyspark import SparkConf print ("success") except ImportError as e: print ("error importing spark modules", e) sys.exit(1) 
+1
source

I was able to find a fix for this on Windows, but am not quite sure of its root cause.

If you open the accumulators.py file, you will see that there is first a comment on the header, then help text, and then import instructions. move one or more import statements immediately after the comment block and before the help text. This worked on my system and I was able to import pyspark without any problems.

0
source

If you have just updated the new intrinsic safety version, make sure that the new py4j version is in your PATH, as each new spark version comes with a new py4j version.

In my case it is: "$ SPARK_HOME / python / lib / py4j-0.10.3-src.zip" for spark 2.0.1 instead of the old "$ SPARK_HOME / python / lib / py4j-0.10.1- src.zip" for spark 2.0.0

0
source

The only thing that worked for me was to go to the spark's base folder. then go to accumulators.py file

In the beginning, the wrong command with multiple lines was used. delete everything.

you are good to go!

0
source

Source: https://habr.com/ru/post/1209737/


All Articles