Download a large CSV file using pandas

I am trying to download a csv file (about 250 MB) as a dataframe with pandas. In my first attempt, I used the typical read_csv command, but I get error memory. I tried the approach mentioned in Big, Persistent DataFrame in pandas using chunks:

x=pd.read_csv('myfile.csv', iterator=True, chunksize=1000) xx=pd.concat([chunk for chunk in x], ignore_index=True) 

but when I tried to concatenate, I got the following error: Exception: "All objects passed were None . " In fact, I can’t access the pieces

I am using winpython 3.3.2.1 for 32 bits with pandas 0.11.0

+4
source share
2 answers

I suggest installing a 64-bit version of winpython. Then you can download a 250 MB file without any problems.

+2
source

I'm late, but the problem with the published code is that using pd.concat([chunk for chunk in x]) effectively removes any advantage of chunking, because it merges all of these pieces into one big DataFrame again.
It probably even temporarily takes twice memory.

0
source

Source: https://habr.com/ru/post/1494305/


All Articles