Highlight a new derived column in a DataFrame from boolean to integer

Suppose I have a DataFrame x with this schema:

 xSchema = StructType([ \ StructField("a", DoubleType(), True), \ StructField("b", DoubleType(), True), \ StructField("c", DoubleType(), True)]) 

Then I have a DataFrame:

 DataFrame[a :double, b:double, c:double] 

I would like to have an integer column. I can create a boolean column:

 x = x.withColumn('y', (xa-xb)/xc > 1) 

My new scheme:

 DataFrame[a :double, b:double, c:double, y: boolean] 

However, I would like the y column to contain 0 for False and 1 for True.

The cast function can only work with a column, not with a DataFrame , and the withColumn function can only work with a DataFrame . How to add a new column and apply it to an integer at the same time?

+5
source share
1 answer

The expression you use is evaluated as a column, so you can directly outline:

 x.withColumn('y', ((xa-xb) / xc > 1).cast('integer')) # Or IntegerType() 
+9
source

Source: https://habr.com/ru/post/1234555/


All Articles