Avoid regex in pandas str.replace

I have the following pandas framework. for simplicity, suppose it has only two columns: id and search_term

 id search_term 37651 inline switch 

I do:

 train['search_term'] = train['search_term'].str.replace("in."," in. ") 

expecting the dataset above will not be affected, but I get this dataset in return:

 id search_term 37651 in. in. switch 

which means that inl is replaced by in. , and ine is replaced by in. as if I were using a regex, where dot means any character.

How to transfer the first command so that literally in. has been replaced by in. , but any in followed by a period is not touched, as in:

 a = 'inline switch' a = a.replace('in.','in. ') a >>> 'inline switch' 
+7
source share
3 answers

Try to complete . :

 import pandas as pd df = pd.DataFrame({'search_term': ['inline switch', 'in.here']}) >>> df.search_term.str.replace('in\\.', 'in. ') 0 inline switch 1 in. here Name: search_term, dtype: object 
+1
source

and here is the answer: a regular expression to match a dot.

str.replace () in pandas really uses regex, so:

 df['a'] = df['a'].str.replace('in.', ' in. ') 

not comparable to:

 a.replace('in.', ' in. ') 

the latter does not use regex. Therefore use '\.' instead of '.' in an expression that uses a regular expression if you really mean a period, not some character.

Regular expression to match point

+1
source

Starting with version 0.23 and higher, str.replace () received a new option for switching regular expressions. After just turn it off.

 df.search_term.str.replace('in.', 'in. ', regex=False) 

The result will be:

 0 inline switch 1 in. here Name: search_term, dtype: object 
+1
source

Source: https://habr.com/ru/post/1246028/


All Articles