Previous questions asking about this error have answers saying that all you have to do is update the version of Spark. I just uninstalled my previous version of Spark and installed Spark 1.6.3 for Hadoop 2.6.0.
I tried this:
s_df = sc.createDataFrame(pandas_df)
And got this error:
AttributeError Traceback (most recent call last)
<ipython-input-8-4e8b3fc80a02> in <module>()
1
AttributeError: 'SparkContext' object has no attribute 'createDataFrame'
Does anyone know why? I tried uninstalling and reinstalling the same version 1.6, but this did not work for me.
Here are my environment variables I fiddled with to get pyspark working correctly:
PATH="/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin"
export PATH
PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH
export PATH="/Users/pr/anaconda:$PATH"
export JAVA_HOME=$(/usr/libexec/java_home)
export SPARK_HOME="/Users/pr/spark"
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_SUBMIT_ARGS="--master local[2]"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
Maybe I need to install Hadoop separately? I skipped this step because I do not need it for the code that I ran.