Pyspark "IS IN IN IN" Data Operator

Question

I would like to rewrite this from R to Pyspark, any nice suggestions?

array <- c(1,2,3) dataset <- filter(!(column %in% array))

+15

Babu Oct 27 '16 at 14:26

5 answers

 df_result = df[df.column_name.isin([1, 2, 3]) == False]

+8

user7438406 Jan 19 '17 at 0:23

Take the ~ operator, which means the opposite:

 df_filtered = df.filter(~df["column_name"].isin([1, 2, 3]))

+8

La sul Dec 18 '18 at 14:26

slightly different syntax and dataset:

 toGetDates={'2017-11-09', '2017-11-11', '2017-11-12'} df= df.filter(df['DATE'].isin(toGetDates) == False)

+3

Grant shannon Nov 14 '17 at 9:07

You can also loop the array and filter:

 array = [1, 2, 3] for i in array: df = df.filter(df["column"] != i)

0

Shadowtrooper Jun 18 '19 at 9:26

Ryan widmaier · Accepted Answer · 2016-10-27T15:53:15+0000

In pyspark you can do it like this:

 array = [1, 2, 3] dataframe.filter(dataframe.column.isin(*array) == False)