How to register UDF for use in SQL and DataFrame?

From what I saw, for this you need

  • make udfas a normal function
  • register function with SQLContextfor SQL

    spark.sqlContext.udf.register("myUDF", myFunc)
    
  • turn it in UserDefinedFunctionforDataFrame

    def myUDF = udf(myFunc)
    

Is there no way to combine this in one step and make it udfaccessible to both? Also, for cases where a function exists for DataFrame, but not for SQL, how can you register it without copying the code again?

+4
source share
2 answers

UDFRegistration.register, scala.FunctionN, UserDefinedFunction, SQL DSL- UDF :

val timesTwoUDF = spark.udf.register("timesTwo", (x: Int) => x * 2)
spark.sql("SELECT timesTwo(1)").show
+---------------+
|UDF:timesTwo(1)|
+---------------+
|              2|
+---------------+
spark.range(1, 2).toDF("x").select(timesTwoUDF($"x")).show
+------+
|UDF(x)|
+------+
|     2|
+------+
+8

- dataframe

spark.sqlContext.udf.register("myUDF", myFunc)

selectExpr .

df.selectExpr("myUDF(col1) as modified_col1")
0

Source: https://habr.com/ru/post/1675107/


All Articles