Reading a text file with Pandas, where do some lines have blank elements?

I have a dataset in a text file that looks like this.

0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.021650 R 0 18EA0080 X 3 E9 FE 00 0.022550 R 0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.023120 R 

I read it using

 file_pandas = pd.read_csv(fileName, delim_whitespace = True, header = None, engine = 'python') 

And got a conclusion

  0 0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.02165 1 0 18EA0080 X 3 E9 FE 0 0.022550 R None None None NaN 2 0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.02312 

But I want it read as

  0 0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.021650 R 1 0 18EA0080 X 3 E9 FE 00 0.022550 R 2 0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.023120 R 

I tried deleting delim_whitespace = True and replacing it with delimiter = " " , but just merged the first four columns in the output shown above, but it analyzed the rest of the data correctly, which means that the rest of the columns were similar to the original txt file (ban NaN values ​​in spaces).

I am not sure how to proceed from here.

Side note: 00 only parsed as 0 . Is there a way to display 00 instead?

+5
source share
1 answer

It seems that your data is fixed width columns, you can try pandas.read_fwf() :

 from io import StringIO import pandas as pd df = pd.read_fwf(StringIO("""0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.021650 R 0 18EA0080 X 3 E9 FE 00 0.022550 R 0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.023120 R"""), header = None, widths = [1,12,2,8,4,4,4,4,4,4,4,4,16,2]) 

enter image description here

+8
source

Source: https://habr.com/ru/post/1258488/


All Articles