Configure Ipython / Jupyter with Pyspark on AWS EMR v4.0.0

I am trying to use an IPython laptop with Apache Spark 1.4.0. I followed 2 tutorials below to set up the configuration

Installing an Ipython laptop with pyspark 1.4 on AWS

and

Configure IPython Notebook Support for Pyspark

After configuring, follow a few code in the related files:

1.ipython_notebook_config.py

c=get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser =False
c.NotebookApp.port = 8193

2.00-pyspark-setup.py

import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")

# Add the py4j to the path.
# You may need to change the version number to match your install

sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
# Initialize PySpark to predefine the SparkContext variable 'sc'
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

I also add the following two lines to my .bash_profile:

export SPARK_HOME='home/hadoop/sparl'
source ~/.bash_profile

However, when I run

ipython notebook --profile=pyspark

it shows a message: the unrecognized alias '--profile = pyspark' is likely to have no effect

It seems that the laptop is not configurable using pyspark Does anyone know how to solve it? Many thanks

follow some software versions

ipython / jupyter: 4.0.0

spark 1.4.0

AWS EMR: 4.0.0

python: 2.7.9

, , IPython

+4
5

Jupyter ( IPython). , :

JUPTYER_CONFIG_DIR=~/alternative_jupyter_config_dir jupyter notebook

. jupyter/notebook # 309, , Jupyter PySpark .

+4

...

~/.bashrc :

export SPARK_HOME="<your location of spark>"
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

( pyspark )

ipython, . pyspark:

ipython profile create pyspark

~/.ipython/profile_pyspark/startup/00-pyspark-setup.py:

import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.9-src.zip'))

filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))

spark_release_file = spark_home + "/RELEASE"

if os.path.exists(spark_release_file) and "Spark 1.6" in open(spark_release_file).read():
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")

( py4j )

mkdir -p ~/.ipython/kernels/pyspark, ~/.ipython/kernels/pyspark/kernel.json

{
 "display_name": "pySpark (Spark 1.6.1)",
 "language": "python",
 "argv": [
  "/usr/bin/python",
  "-m",
  "IPython.kernel",
  "--profile=pyspark",
  "-f",
  "{connection_file}"
 ]
}

​​pySpark (Spark 1.6.1) jupyter new notebook. , sc .

+1

4.0, , , 3.2.3. IPython:

conda install 'ipython<4'

anazoning! !

ref: https://groups.google.com/a/continuum.io/forum/#!topic/anaconda/ace9F4dWZTA

0

, Jupyter . , , jupyter, ( zsh, bash)

emacs ~/.zshrc
export PATH="/Users/hcorona/anaconda/bin:$PATH"
export SPARK_HOME="$HOME/spark"
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_SUBMIT_ARGS="--master local[*,8] pyspark-shell"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH

It is important to add pyspark-shellin. PYSPARK_SUBMIT_ARGS I found this guide useful, but not completely accurate.

My config is local, but should work if you use PYSPARK_SUBMIT_ARGSfor the ones you need.

0
source

I have the same problem to specify -profile ** kwarg. This seems to be a common problem with the new non-Spark version. If you upgrade to ipython 3.2.1, you can specify the profile again.

-1
source

Source: https://habr.com/ru/post/1609085/


All Articles