Log in to UDF in PySpark

I call the API in my UDF and try to register the output in Logger and get a serialization error.

The following is the Logger initialization code:

log4jLogger = spark._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)

In my UDF I register using

LOGGER.info("Message")

But I get an error

pickle.PicklingError: Could not serialize object: Py4JError: An error occurred while calling o31.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist

When registering UDF

distance_udf = udf(distfunc, DoubleType())

Could you please correct me that I have to change in my protocol, as well as that if I want to enter a separate log file.

thank

+4
source share

Source: https://habr.com/ru/post/1692733/


All Articles