I would like to rewrite this from R to Pyspark, any nice suggestions?
array <- c(1,2,3) dataset <- filter(!(column %in% array))
In pyspark you can do it like this:
array = [1, 2, 3] dataframe.filter(dataframe.column.isin(*array) == False)
df_result = df[df.column_name.isin([1, 2, 3]) == False]
Take the ~ operator, which means the opposite:
df_filtered = df.filter(~df["column_name"].isin([1, 2, 3]))
slightly different syntax and dataset:
toGetDates={'2017-11-09', '2017-11-11', '2017-11-12'} df= df.filter(df['DATE'].isin(toGetDates) == False)
You can also loop the array and filter:
array = [1, 2, 3] for i in array: df = df.filter(df["column"] != i)
Source: https://habr.com/ru/post/1258891/More articles:Why introduce an interface while it is already implemented on parents? - javaHTML5 audio and garbage collection - javascriptstd :: vector the difference in behavior between msvc and gcc on the operator [] after the reserve, what is correct? - c ++Difference between Android Monitor method trace and Android device monitor - androidd3.time.format.multi in v4.x - d3.jsAttack / Follow my player - c #Is there a predefined way to skip calculations that lead to Nothing? - haskellBypassing the lack of template virtual functions in C ++ - c ++How to add / update query parameters in Angular2 - angularFailed to register org.sonar.plugins.cobertura.CoberturaSensor extension from plugin 'cobertura' - gradleAll Articles