Spark withColumn () performs power functions

Question

Spark withColumn () performs power functions

I have a df data frame with columns "col1" and "col2". I want to create a third column that uses one of the columns, as in the exponent function.

df = df.withColumn("col3", 100**(df("col1")))*df("col2")

However, this always leads to:

TypeError: unsupported operand type for ** or pow (): 'float' and 'Column'

I understand that this is due to the fact that the function takes df ("col1") as a "column" instead of an element in this row.

If I do

results = df.map(lambda x : 100**(df("col2"))*df("col2"))

this works, but I cannot add to the original data frame.

Any thoughts?

This is my first post, so I apologize for any formatting issues.

+4

python apache-spark pyspark

zdcheng Oct 22 '15 at 0:36

source share

1

zero323 · Accepted Answer · 2015-10-22T00:53:14+0000

Spark 1.4 pow :

from pyspark.sql import Row
from pyspark.sql.functions import pow, col

row = Row("col1", "col2")
df = sc.parallelize([row(1, 2), row(2, 3), row(3, 3)]).toDF()

df.select("*", pow(col("col1"), col("col2")).alias("pow")).show()

## +----+----+----+
## |col1|col2| pow|
## +----+----+----+
## |   1|   2| 1.0|
## |   2|   3| 8.0|
## |   3|   3|27.0|
## +----+----+----+

, UDF Python :

import math
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType

my_pow = udf(lambda x, y: math.pow(x, y), DoubleType())

Spark withColumn () performs power functions

More articles: