Parse error updating multiple columns in 1 row

Login to pd.read_clipboard()

 Ratanhia ,30c x2, 200c x2 Aloe ,30c x2, 200c x2 Nitric Acid ,30c x2, 200c x 2 Sedum Acre ,200c x2, 30c x2 Paeonia ,200c x2, 30c x2 Sulphur ,200c x2, 30c x2 Hamamelis ,30c x1, 200c x1 Aesculus ,30c x1, 200c x1⁠⁠⁠⁠ 

code:

 import pandas as pd df = pd.read_clipboard(header=None, sep=',') df.columns = ['Medicine','power30c','power200c'] df.power30c=df.power30c.apply(lambda x: x[-1]) df.power200c=df.power200c.apply(lambda x: x[-1]) print df 

Output:

  Medicine power30c power200c 0 Ratanhia 2 2 1 Aloe 2 2 2 Nitric Acid 2 2 3 Sedum Acre 2 2 4 Paeonia 2 2 5 Sulphur 2 2 6 Hamamelis 1 1 7 Aesculus 1   

Questions:

  • Why is this on the last line?
  • How to change more than 1 column in 1 row?
 df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x[-1]) Throws error: ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements 

Python Version: 2.7, Pandas: 0.19, IPython: 4

+5
source share
4 answers

You need the skipinitialspace parameter:

 df = pd.read_clipboard(sep=',', names=['Medicine','power30c','power200c'], skipinitialspace=True) print (df) Medicine power30c power200c 0 Ratanhia 30c x2 200c x2 1 Aloe 30c x2 200c x2 2 Nitric Acid 30c x2 200c x 2 3 Sedum Acre 200c x2 30c x2 4 Paeonia 200c x2 30c x2 5 Sulphur 200c x2 30c x2 6 Hamamelis 30c x1 200c x1 7 Aesculus 30c x1 200c x1 

And then indexing with a string :

 df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x.str[-1]) print (df) Medicine power30c power200c 0 Ratanhia 2 2 1 Aloe 2 2 2 Nitric Acid 2 2 3 Sedum Acre 2 2 4 Paeonia 2 2 5 Sulphur 2 2 6 Hamamelis 1 1 7 Aesculus 1 1 
+2
source

There are some characters at the end of the text you posted. If you make a copy here, it works:

 Ratanhia ,30c x2, 200c x2 Aloe ,30c x2, 200c x2 Nitric Acid ,30c x2, 200c x 2 Sedum Acre ,200c x2, 30c x2 Paeonia ,200c x2, 30c x2 Sulphur ,200c x2, 30c x2 Hamamelis ,30c x1, 200c x1 Aesculus ,30c x1, 200c x1 

When I use the text you sent, you can see additional characters:

 In [88]: df.power200c[6] Out[88]: '200c x1' In [89]: df.power200c[7] Out[89]: '200c x1\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0' 
+1
source

This is a syntax error.

  Ratanhia ,30c x2, 200c x2 Aloe ,30c x2, 200c x2 Nitric Acid ,30c x2, 200c x2 Sedum Acre ,200c x2, 30c x2 Paeonia ,200c x2, 30c x2 Sulphur ,200c x2, 30c x2 Hamamelis ,30c x1, 200c x1 Aesculus ,30c x1, 200c x1 

The last character in your table refers to non-utf-8. Please correct one above. After solving this problem, the length mismatch is automatically resolved.

+1
source

I think this solution does not take into account the fact that you have mixed data in the "power *" columns:

  Medicine power30c power200c 0 Ratanhia 30c x2 200c x2 1 Aloe 30c x2 200c x2 2 Nitric Acid 30c x2 200c x 2 3 Sedum Acre 200c x2 30c x2 # / NOTE: mixed up values 4 Paeonia 200c x2 30c x2 # < "200c" is in the "power30c" column 5 Sulphur 200c x2 30c x2 # \ and "30c" is in the "power200c" column 6 Hamamelis 30c x1 200c x1 7 Aesculus 30c x1 200c x1 

Here is another solution:

 In [34]: df Out[34]: Medicine power30c power200c 0 Ratanhia 30c x2 200c x2 1 Aloe 30c x2 200c x2 2 Nitric Acid 30c x2 200c x 2 3 Sedum Acre 200c x2 30c x2 4 Paeonia 200c x2 30c x2 5 Sulphur 200c x2 30c x2 6 Hamamelis 30c x1 200c x1 7 Aesculus 30c x1 200c x1⁠⁠⁠⁠ In [35]: (df.set_index('Medicine') ...: .stack() ...: .str.extract(r'(\d+)c\s+x\s*(\d+)', expand=True) ...: .reset_index(level=1, drop=1) ...: .pivot(columns=0, values=1) ...: .add_prefix('power') ...: .add_suffix('c') ...: .reset_index() ...: ) ...: Out[35]: 0 Medicine power200c power30c 0 Aesculus 1 1 1 Aloe 2 2 2 Hamamelis 1 1 3 Nitric Acid 2 2 4 Paeonia 2 2 5 Ratanhia 2 2 6 Sedum Acre 2 2 7 Sulphur 2 2 
+1
source

Source: https://habr.com/ru/post/1262406/


All Articles