Pandas dataframe: add and remove prefix / suffix from all cell values โ€‹โ€‹of the entire data frame

To add a prefix / suffix to a data framework, I usually do the following.

For example, to add the suffix '@' ,

 df = df.astype(str) + '@' 

Basically this added '@' to all cell values.

I would like to know how to remove this suffix. Is there a method available with the pandas.DataFrame class that removes a specific prefix / suffix character from the entire DataFrame?

I tried iterating over the lines (like series) when using rstrip('@') as follows:

 for index in range(df.shape[0]): row = df.iloc[index] row = row.str.rstrip('@') 

Now, to make data from this series,

 new_df = pd.DataFrame(columns=list(df)) new_df = new_df.append(row) 

However, this does not work. Gives an empty data frame.

Is there anything really basic that I'm missing?

+5
source share
2 answers

You can use applymap to apply your string method to each element:

 df = df.applymap(lambda x: str(x).rstrip('@')) 
+3
source

You can use the apply and str.strip method:

 In [13]: df Out[13]: abc 0 dog quick the 1 lazy lazy fox 2 brown quick dog 3 quick the over 4 brown over lazy 5 fox brown quick 6 quick fox the 7 dog jumped the 8 lazy brown the 9 dog lazy the In [14]: df = df + "@" In [15]: df Out[15]: abc 0 dog@ quick@ the@ 1 lazy@ lazy@ fox@ 2 brown@ quick@ dog@ 3 quick@ the@ over@ 4 brown@ over@ lazy@ 5 fox@ brown@ quick@ 6 quick@ fox@ the@ 7 dog@ jumped@ the@ 8 lazy@ brown@ the@ 9 dog@ lazy@ the@ In [16]: df = df.apply(lambda S:S.str.strip('@')) In [17]: df Out[17]: abc 0 dog quick the 1 lazy lazy fox 2 brown quick dog 3 quick the over 4 brown over lazy 5 fox brown quick 6 quick fox the 7 dog jumped the 8 lazy brown the 9 dog lazy the 

Please note: your approach does not work, because when you perform the following assignment in your for-loop:

 row = row.str.rstrip('@') 

It simply assigns the result of row.str.strip name row without mutating the DataFrame . This is the same behavior for all python objects and simple naming:

 In [18]: rows = [[1,2,3],[4,5,6],[7,8,9]] In [19]: print(rows) [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [20]: for row in rows: ...: row = ['look','at','me'] ...: In [21]: print(rows) [[1, 2, 3], [4, 5, 6], [7, 8, 9]] 

To change the basic data structure, you need to use the mutator method:

 In [22]: rows Out[22]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [23]: for row in rows: ...: row.append("LOOKATME") ...: In [24]: rows Out[24]: [[1, 2, 3, 'LOOKATME'], [4, 5, 6, 'LOOKATME'], [7, 8, 9, 'LOOKATME']] 

Note that slice assignment is just syntactic sugar for the mutator method:

 In [26]: rows Out[26]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [27]: for row in rows: ...: row[:] = ['look','at','me'] ...: ...: In [28]: rows Out[28]: [['look', 'at', 'me'], ['look', 'at', 'me'], ['look', 'at', 'me']] 

This is similar to assigning pandas loc or iloc .

+5
source

Source: https://habr.com/ru/post/1261164/


All Articles