I am trying to run my own HDFS reading class in PySpark. This class is written in Java, and I need to access it from PySpark, either from the shell or using spark-submit.
In PySpark, I retrieve JavaGateway from SparkContext ( sc._gateway
).
Say I have a class:
package org.foo.module public class Foo { public int fooMethod() { return 1; } }
I tried packing it in a jar and passing it using the --jar
in pyspark and then running:
from py4j.java_gateway import java_import jvm = sc._gateway.jvm java_import(jvm, "org.foo.module.*") foo = jvm.org.foo.module.Foo()
But I get the error:
Py4JError: Trying to call a package.
Can anyone help with this? Thanks.
source share