Pyspark "IS IN IN IN" Data Operator

I would like to rewrite this from R to Pyspark, any nice suggestions?

array <- c(1,2,3) dataset <- filter(!(column %in% array)) 
+15
source share
5 answers

In pyspark you can do it like this:

 array = [1, 2, 3] dataframe.filter(dataframe.column.isin(*array) == False) 
+36
source
 df_result = df[df.column_name.isin([1, 2, 3]) == False] 
+8
source

Take the ~ operator, which means the opposite:

 df_filtered = df.filter(~df["column_name"].isin([1, 2, 3])) 
+8
source

slightly different syntax and dataset:

 toGetDates={'2017-11-09', '2017-11-11', '2017-11-12'} df= df.filter(df['DATE'].isin(toGetDates) == False) 
+3
source

You can also loop the array and filter:

 array = [1, 2, 3] for i in array: df = df.filter(df["column"] != i) 
0
source

Source: https://habr.com/ru/post/1258891/


All Articles