Pandas: read_csv ignore lines after empty line

There is a strange .csv file, something like:

header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33

pretty well, but after these lines there is always an empty line followed by a lot of useless lines. All material is a line:


header1,header2,header3
val11,val12,val13
val21,val22,val23
val31,val32,val33

dhjsakfjkldsa
fasdfggfhjhgsdfgds
gsdgffsdgfdgsdfgs
gsdfdgsg

The number of lines at the bottom is completely random, the only caveat is the empty line in front of them.

Pandas has a skipfooter parameter to ignore the known number of lines in the footer.

Any idea on how to ignore these lines without opening the (open () ...) file and deleting them?

+4
source share
2 answers

If you are using a module csv, it is pretty simple to define an empty string.

import csv 

with open(filename, newline='') as f:
    r = csv.reader(f)
    for l in r:
        if not l:
            break
        #Otherwise, process data
+1
source

read_csv, . / . () , ( ).

( - pandas):

  • . []

    pd.read_csv('file.csv', nrows=3) pd.read_csv('file.csv', skipfooter=4)

  • DataFrame. []

    df.dropna(axis=0, how='any', inplace=True)

:

  header1 header2 header3
0   val11   val12   val13
1   val21   val22   val23
2   val31   val32   val33
+1

Source: https://habr.com/ru/post/1663260/


All Articles