In any case, this is more of an argument against using Python UDF than PySpark in general, and to a lesser extent, a similar argument can be made against embedded (implemented in the JVM) UDF.
You should also notice that vectorized UDFs are on the Spark roadmap, so:
the real cost is serializing data in Python
may no longer be a problem in the future.
?
. node ( ), UDF - .
, , API RDD, JVM , . Python, , Python.