I would like to classify parts using data frames.
Simplifying the problem to try to show the problem:
data = {'col1': ['engine','blue engine cover','spark plug',
'rear panel','black rear panel', 'blue engine']}
desc_df = pd.DataFrame(data=data)
catg = {'bodywork': ['engine cover','side panel','rear panel'],'underhood':['engine','spark plug','oil filter'],
'Glass':['Windscreen','window','demister']}
catg_df = pd.DataFrame(data=catg)
catg_df
Glass bodywork underhood
0 Windscreen engine cover engine
1 window side panel spark plug
2 demister rear panel oil filter
desc_df
col1
0 engine
1 blue engine cover
2 spark plug
3 rear panel
4 black rear panel
5 blue engine
I would like to end up with:
col1 Category
0 engine underhood
1 blue engine cover underhood
2 spark plug underhood
3 rear panel bodywork
4 black rear panel bodywork
5 blue engine underhood
The closest I came up with is:
d=catg_df.apply('|'.join).to_dict()
desc_df['Category'] = desc_df['col1'].apply(lambda x : ''.join([z if pd.Series(x).str.contains(y).values else '' for z,y in d.items()]))
But in the end, I found in the line "engine" and "engine cover": desc_df
col1 Category
0 engine underhood
1 blue engine cover bodyworkunderhood
2 spark plug underhood
3 rear panel bodywork
4 black rear panel bodywork
5 blue engine underhood
Is there any method that I could use, perhaps if he first finds the “engine cover” and then categorizes this category and does not move to the “engine”.