Removing rows in pandas DataFrame, where does the row contain the row present in the list?

I know how to delete rows from a single column ('From') of a pandas DataFrame, where the row contains a row, for example, given dfand somestring:

df = df[~df.From.str.contains(someString)]

Now I want to do something similar, but this time I want to delete any lines containing a line that is in any element of another list . If I did not use pandas, I would use the forand approach if ... not ... in. But how can I use pandas' own functionality to achieve this? Given a list of items to remove ignorethese extracted from the file comma separated lines EMAILS_TO_IGNORE, I tried:

with open(EMAILS_TO_IGNORE) as emails:
        ignorethese = emails.read().split(', ')
        df = df[~df.From.isin(ignorethese)]

Am I bewildered by first putting the file in a list? Given that this is a comma delimited text file, can I get around this with something simpler?

+4
source share
1 answer

Series.str.containssupports regular expression, you can create a regular expression from your list of letters to ignore using |to to ORthem, and then use this in contains. Example -

df[~df.From.str.contains('|'.join(ignorethese))]

Demo -

In [109]: df
Out[109]:
                                         From
0         Grey Caulfu <grey.caulfu@ymail.com>
1  Deren Torculas <deren.e.torcs87@gmail.com>
2    Charlto Youna <youna.charlto4@yahoo.com>

In [110]: ignorelist = ['grey.caulfu@ymail.com','deren.e.torcs87@gmail.com']

In [111]: ignorere = '|'.join(ignorelist)

In [112]: df[~df.From.str.contains(ignorere)]
Out[112]:
                                       From
2  Charlto Youna <youna.charlto4@yahoo.com>

Please note that, as stated in the documentation , it uses re.search().

+2
source

Source: https://habr.com/ru/post/1607960/


All Articles