How to keep leading white spaces in pandas Series in python?

I am trying to read a text file via read_csv from pandas in python. My text file looks like (all values ​​are in numbers):

 35 61  7 1 0              # with leading white spaces
  0 1 1 1 1 1              # with leading white spaces
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
1(01-02),2(02-03),3(03-04) # this line cause 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

my python code is as follows:

import pandas as pd
df = pd.read_csv('example.txt', header=None)
df

The output looks like this:

CParserError: 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

Before handling leading spaces, I need to process the "error tokenization data". first issue. So I changed the code, for example:

import pandas as pd
df = pd.read_csv('example.txt', header=None, error_bad_lines=False)
df

I can get the data with leading spaces, as I expected, but the data in line 5 has disappeared. The output is as follows:

b'Skipping line 5: expected 1 fields, saw 3\n
 35 61  7 1 0              # with leading white spaces as intended
  0 1 1 1 1 1              # with leading white spaces as intended
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
                           # 5th line disappeared (not my intention).

So, I tried changing my code below to get the 5th line.

import pandas as pd
df = pd.read_csv('example.txt', header=None, sep=':::', engine='python')
df

I successfully received the data in line 5, but the white spaces in lines 1 and 2 went as follows:

35 61  7 1 0               # without leading white spaces(not my intention)
0 1 1 1 1 1                # without leading white spaces(not my intention)
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
1(01-02),2(02-03),3(03-04) # I successfully got this line as intended.

, . .

+4
1

. sep ^ , .

s = pd.read_csv('example.txt', header=None, sep='^', squeeze=True)

s

0                  35 61  7 1 0
1                   0 1 1 1 1 1
2                 33 221 22 0 1
3                       233   2
4    1(01-02),2(02-03),3(03-04)
Name: 0, dtype: object

s[1]
'  0 1 1 1 1 1'
+3

Source: https://habr.com/ru/post/1692184/


All Articles