Filter rows from CSV before loading into pandas dataframe

I have a large csv file that I cannot load into a DataFrame using read_csv () due to memory issues. However, there is a {0,1} flag in the first csv column, and I only need to load the rows with "1", which will be easy to be small enough to fit into the DataFrame. Is there a way to load data with a condition or manipulate csv before loading it (similar to grep)?

+4
source share
1 answer

You can use s parameter and set it to pd.read_csvcomment'0'

import pandas as pd
from io import StringIO

txt = """col1,col2
1,a
0,b
1,c
0,d"""

pd.read_csv(StringIO(txt), comment='0')

   col1 col2
0     1    a
1     1    c

chunksize, pd.read_csv query pd.concat
: , 1 . . , .

pd.concat([df.query('col1 == 1') for df in pd.read_csv(StringIO(txt), chunksize=1)])
# Equivalent to and slower than... use the commented line for better performance
# pd.concat([df[df.col1 == 1] for df in pd.read_csv(StringIO(txt), chunksize=1)])

   col1 col2
0     1    a
2     1    c
+7

Source: https://habr.com/ru/post/1675003/


All Articles