Python Pandas Summary if the description contains a phrase in a list

I have a long list of (200,000+) phrases:

phrase_list = ['some word', 'another example', ...]

And the pandas two-frame framework with a description in the first column and some score in the second

Description                                    Score
this sentence contains some word in it         6
some word is on my mind                        3
repeat another example of me                   2
this sentence has no matches                   100
another example with some word                 10

There are 300,000 lines. For each phrase in the phrase list, I want to get the cumulative score if this phrase is found on each line. So, for “some word” the score will be 6 + 3 + 10 = 19. For “another example,” the score will be 2 + 10 = 12.

The code that I still work, but very slow:

phrase_score = []

for phrase in phrase_list:
    phrase_score.append([phrase, df['score'][df['description'].str.contains(phrase)].sum()])

I would like to return the pandas framework with the phrase in one column, and the score in the second (this part is trivial if I have a list of lists). However, I need a faster way to get a list of lists.

+4
1

.

, . df.Description.str.contains(phrase). , , df.Score[mask].sum().

df = pd.DataFrame({'Description': ['this sentence contains some word in it', 
                                   'some word on my mind', 
                                   'repeat another word on my mind', 
                                   'this sentence has no matches', 
                                   'another example with some word'], 
                   'Score': [6, 3, 2, 100, 10]})

phrase_list = ['some word', 'another example']
scores = {phrase: df.Score[df.Description.str.contains(phrase)].sum() 
          for phrase in phrase_list}

>>> scores
{'another example': 10, 'some word': 19}

, . , , , for. . , .

+1

Source: https://habr.com/ru/post/1616462/


All Articles