I am trying to read a text file via read_csv from pandas in python. My text file looks like (all values are in numbers):
35 61 7 1 0 # with leading white spaces
0 1 1 1 1 1 # with leading white spaces
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # this line cause 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
my python code is as follows:
import pandas as pd
df = pd.read_csv('example.txt', header=None)
df
The output looks like this:
CParserError: 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
Before handling leading spaces, I need to process the "error tokenization data". first issue. So I changed the code, for example:
import pandas as pd
df = pd.read_csv('example.txt', header=None, error_bad_lines=False)
df
I can get the data with leading spaces, as I expected, but the data in line 5 has disappeared. The output is as follows:
b'Skipping line 5: expected 1 fields, saw 3\n
35 61 7 1 0
0 1 1 1 1 1
33 221 22 0 1
233 2
So, I tried changing my code below to get the 5th line.
import pandas as pd
df = pd.read_csv('example.txt', header=None, sep=':::', engine='python')
df
I successfully received the data in line 5, but the white spaces in lines 1 and 2 went as follows:
35 61 7 1 0 # without leading white spaces(not my intention)
0 1 1 1 1 1 # without leading white spaces(not my intention)
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # I successfully got this line as intended.
, . .