Running Python startup code after loading modules

Question

Running Python startup code after loading modules

I work with Jupyter laptops and Python kernels with SparkContext. An employee wrote Python code that passes Spark events with ipykernel events. When we import its module from a laptop cell, it works in all the combinations that we need to support: Python 2.7 and 3.5, Spark 1.6 and 2.x, only Linux.

Now we want to include this code automatically for all Python kernels. I import into ours sitecustomize.py. This works fine for Spark 2.x, but not for Spark 1.6. Kernels with Spark 1.6 no longer receive sc, and something is so confusing that unrelated imports, for example matplotlib.cbook, fail. When I delay this import for a few seconds using a timer, it works. Apparently, the code in sitecustomize.pyruns too soon to import the module that connects Spark to ipykernel.

I am looking for a way to defer this import until Spark and / or ipykernel are fully initialized. But it still needs to be run as part of the kernel launch before any laptop cells are executed. I found this trick to delay code execution until sys.argvit is initialized. But I do not think that it can work with global variables of the type sc, given that the Python global variables are still local to the modules. So far, the best I can think of is to use a timer to check every second if certain modules are present in sys.modules. But this is not very reliable, because I do not know how to distinguish a module that is fully initialized from the one that is still in the process of loading.

Any ideas on how to connect to the startup code that executes at the end of the launch? A solution specific to pyspark and / or ipykernel will satisfy my needs.

+6

python ipython apache-spark pyspark

Roland Weber Mar 29 '17 at 12:31

source share

1 answer

Giannis spiliopoulos · Accepted Answer · 2017-04-04T19:53:29+0000

Hmmm, you do not talk very much about what errors you encounter.

I think the canonical way to configure startup behavior for the ipython kernel is to configure the configuration file and set the option exec_lines.

For example, you would put ~/.ipython/profile_default/ipython_config.py

# sample ipython_config.py
c = get_config()

c.InteractiveShellApp.exec_lines = [
    'import numpy',
    'import scipy'
]
c.InteractiveShellApp.exec_files = [
    'mycode.py',
    'fancy.ipy'
]

Running Python startup code after loading modules

More articles: