Two questions:
First you have a module called stop_words , and you will later create a variable called stop_words . This is a bad form.
Secondly, you pass the lambda function to .apply , which wants its parameter x be a list, not a value inside a list.
That is, instead of df.apply(sqrt) you do df.apply(lambda x: [sqrt(val) for val in x]) .
You must either do list processing yourself:
clean = [x for x in usertext if x not in stop_words]
Or you should apply, with a function that takes one word at a time:
clean = usertext.apply(lambda x: x if x not in stop_words else '')
As @ Jean-François Fabre explained in a comment, you can speed things up if your stopwords are a set, not a list:
from stop_words import get_stop_words nl_stop_words = set(get_stop_words('dutch')) # NOTE: set usertext = ... clean = usertext.apply(lambda word: word if word not in nl_stop_words else '')
source share