How to extract the first two characters from a string using regex

Question

How to extract the first two characters from a string using regex

reference: Pandas DataFrame: remove unnecessary parts from rows in a column

Regarding the answer indicated in the link above. I explored some regular expressions and I plan to dive deeper, but at the same time I could use some help.

My dataframe looks something like this:

DF:

c_contofficeID 0 0109 1 0109 2 3434 3 123434 4 1255N9 5 0109 6 123434 7 55N9 8 5599 9 0109

Pseudo code

If the first two characters are 12, delete them. Or, alternatively, add 12 to characters that don't have 12 first two characters.

The result will look like this:

  c_contofficeID 0 0109 1 0109 2 3434 3 3434 4 55N9 5 0109 6 3434 7 55N9 8 5599 9 0109

I use the answer from the link above as a starting point:

 df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'\D',value=r'')

I tried the following:

Attempt 1)

 df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'[1][2]',value=r'')

Attempt 2)

 df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'$[1][2]',value=r'')

Attempt 3)

 df['contofficeID'].replace(regex=True,inplace=True,to_replace=r'?[1]?[2]',value=r'')

+5

python pandas regex

david Oct 26 '16 at 22:45

source share

1 answer

piRSquared · Accepted Answer · 2016-10-26T22:51:36+0000

new answers
for the comment from @Addison

 # '12(?=.{4}$)' makes sure we have a 12 followed by exactly 4 something elses df.c_contofficeID.str.replace('^12(?=.{4}$)', '')

If the identifier must have four characters, it’s easier

 df.c_contofficeID.str[-4:]

old answer
use str.replace

 df.c_contofficeID.str.replace('^12', '').to_frame()

How to extract the first two characters from a string using regex

More articles: