Functions from Python packages for the udf () of the Spark frame

Question

Functions from Python packages for the udf () of the Spark frame

For a Spark dataframe via pyspark, we can use pyspark.sql.functions.udf to create a user defined function (UDF) .

I wonder if I can use any function from Python packages in udf() , e.g. np.random.normal from numpy?

+6

python apache-spark pyspark

Jie chen Apr 6 '15 at 21:18

source share

1 answer

karlson · Answer 1 · 2015-04-14T12:43:48+0000

Assuming you want to add a column named new to your DataFrame df , created by calling numpy.random.normal several times, you can do:

 import numpy from pyspark.sql.functions import UserDefinedFunction from pyspark.sql.types import DoubleType udf = UserDefinedFunction(numpy.random.normal, DoubleType()) df_with_new_column = df.withColumn('new', udf())

Functions from Python packages for the udf () of the Spark frame

More articles: