I am writing Python code to develop some Spark applications. I'm really curious how Python interacts with a running JVM and started reading Spark source code.
I see that in the end, all Spark transformations / actions were called by specific jvm methods as follows.
self._jvm.java.util.ArrayList(),
self._jvm.PythonAccumulatorParam(host, port))
self._jvm.org.apache.spark.util.Utils.getLocalDir(self._jsc.sc().conf())
self._jvm.org.apache.spark.util.Utils.createTempDir(local_dir, "pyspark") \
.getAbsolutePath()
...
As a Python programmer, I'm really interested in what happens to this object _jvm. However, I briefly read the entire source code in pyspark and found the Context_jvm attribute , in addition, I do not know anything about attributes or methods._jvm's
Can someone help me understand how pyspark translates into JVM operations? should i read the scala code and see if it exists there _jvm?
source
share