Finding multiple dictionary keys in a Pandas Dataframe & returning multiple values for matches

Question

Finding multiple dictionary keys in a Pandas Dataframe & returning multiple values for matches

Initial post, apologizing in advance if my formatting is disabled.

Here is my problem:

I created a Pandas framework that contains several lines of text:

d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']}
keywords = pd.DataFrame(d,columns=['keywords'])
In [7]: keywords
Out[7]:
        keywords
0  cheap shoes
1  luxury shoes
2  cheap hiking shoes

Now I have a dictionary that contains the following keys / values:

labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport'}

What I would like to do is find out if the key exists in the dictionary in the data frame, and if so, return the appropriate value

I was able to refine some using the following:

for k,v in labels.items():
   keywords['Labels'] = np.where(keywords['keywords'].str.contains(k),v,'No Match')

However, the output is missing the first two keys and it catches only the last "pedestrian" key

    keywords            Labels
0   cheap shoes         No Match
1   luxury shoes        No Match
2   cheap hiking shoes  sport

In addition, I would also like to know if there is a way to catch multiple values in the dictionary, separated by the | character, so the ideal output would look like this:

    keywords            Labels
0   cheap shoes         budget
1   luxury shoes        expensive
2   cheap hiking shoes  budget | sport

.

+4

python string-matching dictionary python-3.x pandas

J_Win 06 . '18 0:17

5

"|".join(labels.keys()), , re.findall().

import pandas as pd
import re

d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']}
keywords = pd.DataFrame(d,columns=['keywords'])
labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport'}
pattern = "|".join(labels.keys())

def f(s):
    return "|".join(labels[word] for word in re.findall(pattern, s))

keywords.keywords.map(f)

+3

HYRY 06 . '18 0:36

, , ,

arr = np.array([np.where(keywords['keywords'].str.contains(k), v, 'No Match') for k, v in labels.items()]).T
keywords["Labels"] = ["|".join(set(item[ind if ind.sum() == ind.shape[0] else ~ind])) for item, ind in zip(arr, (arr == "No Match"))]

Out[97]: 
             keywords        Labels
0         cheap shoes        budget
1        luxury shoes     expensive
2  cheap hiking shoes  sport|budget

+1

erocoar 06 . '18 0:51

replace, .

keywords.assign(
    values=
    keywords.keywords.replace(labels, regex=True)
            .str.findall(f'({"|".join(labels.values())})')
            .str.join(' | ')
)

             keywords          values
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport

+1

piRSquared 06 . '18 1:47

split , stack , map, . groupby concatenate ,

keywords['Labels'] = keywords.keywords.str.split(expand=True).stack()\
                     .map(labels).groupby(level=0)\
                     .apply(lambda x: x.str.cat(sep=' | '))



            keywords          Labels
0         cheap shoes          budget
1        luxury shoes       expensive
2  cheap hiking shoes  budget | sport

0

DJK 06 . '18 1:23

jpp · Accepted Answer · 2018-03-06T00:23:30+0000

, , . .

d = {'keywords': ['cheap shoes', 'luxury shoes', 'cheap hiking shoes', 'nothing']}

keywords = pd.DataFrame(d,columns=['keywords'])

labels = {'cheap': 'budget', 'luxury': 'expensive', 'hiking': 'sport'}

df = pd.DataFrame(d)

def matcher(k):
    x = (i for i in labels if i in k)
    return ' | '.join(map(labels.get, x))

df['values'] = df['keywords'].map(matcher)

#              keywords          values
# 0         cheap shoes          budget
# 1        luxury shoes       expensive
# 2  cheap hiking shoes  budget | sport
# 3             nothing

Finding multiple dictionary keys in a Pandas Dataframe & returning multiple values ​​for matches

More articles:

Finding multiple dictionary keys in a Pandas Dataframe & returning multiple values for matches