Pandas dataframe: crop string fields

I have a dataframe and would like to truncate each field to 20 characters. I naively tried the following:

df = df.astype(str).apply(lambda x: x[:20]) 

however, it has no effect. If, however, I wanted to add “Y” to each field, this works like a charm:

 df = df.astype(str).apply(lambda x: x+'Y') 

What am I doing wrong?

+5
source share
3 answers

you can use . Str.slice () method :

Demo:

 In [177]: df = pd.DataFrame({ ...: 'a': pd.util.testing.rands_array(30, 10), ...: 'b': pd.util.testing.rands_array(30, 10), ...: }) ...: In [178]: df Out[178]: ab 0 Mlf6nOsC8S6vv8OxW5ZOWifg3EoqAb XSGLdkaewwZlNeZ4uTTivi2nMQFc6S 1 0E4XCBaYFBTSalUMPGpXmke6dQGbkW KlHuVhbNgQL9HLHYQq3fEdqEIciOhX 2 URODJeLA0uLvcKBEXPyrmnnNU40MDl NaY8LURHjgmT1pRrDnbPAeLZq3ANaL 3 OYA1ahlwVtEVnDOAkZgxNkbvZ7W8Rf mIzkeLhM7SqYH17vGDzL6DJjSYftGs 4 uFC1shE02UfxS0VhDASmF8vh9XxFYX fQOxjDjFehTNT27seOtCAAPW0as9Up 5 Ja33vQym6L0Ko2Kcf8cg7OMBKMitg5 iGdCvYTyZlR23NeeTAjG1PoL8mWm3j 6 iNZdXaVpB4zXClxTLt738DY7i6xs6p q9VKg5fZdItmUpZiQrR6XW5WHmd33l 7 WWnViRRMPkbXNQOHeqGmzETDpGPRl9 t3I8Ve3ybCJcXajF8pydnwNZQWslTN 8 5oMFy2PBe1zUIE3XdraMwlrd5MKcx2 gSLtgXJwiS1HugLORXherFT4l1k5QV 9 weV8BlyJrtRbWpSCxSbj8cSyZxusFR ylLWort9o8mHWQQ3JB1Twb0xRbLhot In [179]: df.apply(lambda x: x.str.slice(0, 20)) Out[179]: ab 0 Mlf6nOsC8S6vv8OxW5ZO XSGLdkaewwZlNeZ4uTTi 1 0E4XCBaYFBTSalUMPGpX KlHuVhbNgQL9HLHYQq3f 2 URODJeLA0uLvcKBEXPyr NaY8LURHjgmT1pRrDnbP 3 OYA1ahlwVtEVnDOAkZgx mIzkeLhM7SqYH17vGDzL 4 uFC1shE02UfxS0VhDASm fQOxjDjFehTNT27seOtC 5 Ja33vQym6L0Ko2Kcf8cg iGdCvYTyZlR23NeeTAjG 6 iNZdXaVpB4zXClxTLt73 q9VKg5fZdItmUpZiQrR6 7 WWnViRRMPkbXNQOHeqGm t3I8Ve3ybCJcXajF8pyd 8 5oMFy2PBe1zUIE3XdraM gSLtgXJwiS1HugLORXhe 9 weV8BlyJrtRbWpSCxSbj ylLWort9o8mHWQQ3JB1T 
+5
source

It seems to me that str for indexing with a string :

 df = df.astype(str).apply(lambda x: x.str[:20]) 

Example:

 df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9], 'D':[1,3,5], 'E':[5,3,6], 'F':[7,4,3]}) * 1000 print (df) ABCDEF 0 1000 4000 7000 1000 5000 7000 1 2000 5000 8000 3000 3000 4000 2 3000 6000 9000 5000 6000 3000 df = df.astype(str).apply(lambda x: x.str[:2]) print (df) ABCDEF 0 10 40 70 10 50 70 1 20 50 80 30 30 40 2 30 60 90 50 60 30 

Another solution with applymap :

 df = df.astype(str).applymap(lambda x: x[:2]) print (df) ABCDEF 0 10 40 70 10 50 70 1 20 50 80 30 30 40 2 30 60 90 50 60 30 

The problem with your solution is that if x[:20] select only the first 20 rows in each column.

You can test it by user function:

 def f(x): print (x) print (x[:2]) df = df.astype(str).apply(f) print (df) 
+3
source

A simple one liner to trim a long string field in a Pandas DataFrame:

 df['short_str'] = df['long_str'].str.slice(0,3) 
0
source

Source: https://habr.com/ru/post/1265361/


All Articles