Reading Large Text Files Using Pandas

I am trying to read several large text files (sizes around 1.4 GB - 2 GB) using Pandas using the read_csv function to no avail. The following are the versions I'm using:

  • Python 2.7.6
  • Anaconda 1.9.2 (64-bit) (default, November 11, 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)]
  • IPython 1.1.0
  • Pandas 0.13.1

I tried the following:

 df = pd.read_csv(data.txt') 

and he broke Ipython with the message: Kernel died, restarting .

Then I tried using an iterator:

 tp = pd.read_csv('data.txt', iterator = True, chunksize=1000) 

I got the error Kernel died, restarting .

Any ideas? Or any other way to read large text files?

Thank!

+5
python pandas large-files csv ipython
May 01 '14 at 16:09
source share
1 answer

A solution to a similar question was given here some time after the publication of this question. Basically, it is suggested to read the file in chunks by following these steps:

 chunksize = 10 ** 6 for chunk in pd.read_csv(filename, chunksize=chunksize): process(chunk) 

You must specify the chunksize parameter according to your machine capabilities (i.e. make sure that it can handle the piece).

+5
Jun 26 '17 at 21:51
source share



All Articles