Str.replace starting at the back in a pandas DataFrame

I have two columns:

                                       string                    s
0    the best new york cheesecake new york ny             new york
1               houston public school houston              houston

I want to remove the last occurrence sin string. In context, my DataFrame has hundreds of thousands of rows. I know about str.replaceand str.rfind, but nothing that makes the desired combination of both, and I get to work when improvising a solution.

Thanks in advance for your help!

+4
source share
2 answers

You can use rsplitand join:

df.apply(lambda x: ''.join(x['string'].rsplit(x['s'],1)),axis=1)

Conclusion:

0    the best new york cheesecake  ny
1              houston public school 
dtype: object

edit:

df['string'] = df.apply(lambda x: ''.join(x['string'].rsplit(x['s'],1)),axis=1).str.replace('\s\s',' ')

print(df)

Conclusion:

                            string         s  third
0  the best new york cheesecake ny  new york      1
1           houston public school    houston      1
+5
source

Option 1
Vectorized rsplitwith understanding

from numpy.core.defchararray import rsplit

v = df.string.values.astype(str)
s = df.s.values.astype(str)

df.assign(string=[' '.join([x.strip() for x in y]) for y in rsplit(v, s, 1)])

                            string         s
0  the best new york cheesecake ny  new york
1           houston public school    houston

2
re.sub
s, .

import re

v = df.string.values.astype(str)
s = df.s.values.astype(str)
f = lambda i, j: re.sub(r' *{0} *(?!.*{0}.*)'.format(i), ' ', j).strip()

df.assign(string=[f(i, j) for i, j in zip(s, v)])

                            string         s
0  the best new york cheesecake ny  new york
1            houston public school   houston
+2

Source: https://habr.com/ru/post/1684044/


All Articles