Functions from Python packages for the udf () of the Spark frame

For a Spark dataframe via pyspark, we can use pyspark.sql.functions.udf to create a user defined function (UDF) .

I wonder if I can use any function from Python packages in udf() , e.g. np.random.normal from numpy?

+6
source share
1 answer

Assuming you want to add a column named new to your DataFrame df , created by calling numpy.random.normal several times, you can do:

 import numpy from pyspark.sql.functions import UserDefinedFunction from pyspark.sql.types import DoubleType udf = UserDefinedFunction(numpy.random.normal, DoubleType()) df_with_new_column = df.withColumn('new', udf()) 
+11
source

Source: https://habr.com/ru/post/984762/


All Articles