Pyspark RDD.filter () with template

Question

I have a Pyspark RDD with a text column that I want to use as a filter, so I have the following code:

table2 = table1.filter(lambda x: x[12] == "*TEXT*")

To the problem ... As you can see, I use *to try to tell him to interpret this as a template, but not succeed. Does anyone have any help?

+4

Lucas mattos Aug 31 '16 at 18:23

1 answer

David · Accepted Answer · 2016-08-31T18:24:58+0000

Lambda function is pure python, so something like below will work

table2 = table1.filter(lambda x: "TEXT" in x[12])