Python Pandas: How to read only the first n lines of CSV files?

I have a very large data set, and I cannot afford to read the entire data set. So, I'm thinking of reading only one piece of it to train, but I have no idea how to do it. Any thought would be appreciated.

+27
python pandas
May 25 '14 at 8:50
source share
1 answer

If you only want to read the first 999,999 (without a header) lines:

read_csv(..., nrows=999999) 

If you want to read lines only 1,000,000 ... 1,999,999

 read_csv(..., skiprows=1000000, nrows=999999) 

nrows : int, default None The number of lines of the file to read. Useful for reading chunks of large files *

skiprows : list or integer Line numbers to skip (0-indexed) or number of lines to skip (int) at the beginning of the file

and for large files, you probably also want to use chunksize:

chunksize : int, default None Returns a TextFileReader object to iterate

pandas.io.parsers.read_csv documentation

+38
May 25 '14 at 8:52
source share
— -



All Articles