Search and return the index of a matching substring with pandas

I want to expand the question here

The solutions in the above question are True or False. And booleans can be used to substitute the correct values.

However, I want to get a search value that matches a substring.

For example, (borrowing from the above question)

s = pd.Series(['cat','hat','dog','fog','pet'])
searchfor = ['og', 'at']

I want to know that “cat” matches “on,” and the dog matches “og”

+4
source share
2 answers

IIUC, you want the values ​​to display the index of the item in the list searchforthat matches your word. You can start by changing your object searchfor-

m = {'^.*{}.*$'.format(s) : str(i) for i, s in enumerate(searchfor)}

<pattern : index> . pd.Series.replace regex=True -

s = s.replace(m, regex=True)
s[:] = np.where(s.str.isdigit(), pd.to_numeric(s, errors='coerce'), -1)

s

0    1
1    1
2    0
3    0
4   -1
dtype: int64

, str.extract + groupby + apply -

p = '(^.*({}).*$)'.format('|'.join(searchfor))

s.str.extract(p, expand=True)\
 .groupby([1])[0]\
 .apply(list)

1
at    [cat, hat]
og    [dog, fog]
Name: 0, dtype: object
+4

defaultdict + replace, , .

d=dict(zip(searchfor,[""]*2))

s1=s.replace(d,regex=True)
import collections
d = collections.defaultdict(dict)
for x,y in zip(s1.index,s1):
    d[x][y]=''

s.to_frame('a').T.replace(dict(d), regex=True).T.a


Out[765]: 
0    at
1    at
2    og
3    og
4      
Name: a, dtype: object
+2

Source: https://habr.com/ru/post/1693173/


All Articles