Using the @Anton Protopopov Sample File. Reading a partial bit of a file and a header in separate operations is much cheaper than reading the entire file.
Just read the last lines
In [22]: df = read_csv("file.csv", nrows=10000, skiprows=990001, header=None, index_col=0) In [23]: df Out[23]: 1 2 3 0 990000 -0.902507 -0.274718 1.155361 990001 -0.591442 -0.318853 -0.089092 990002 -1.461444 -0.070372 0.946964 990003 0.608169 -0.076891 0.431654 990004 1.149982 0.661430 0.456155 ... ... ... ... 999995 0.057719 0.370591 0.081722 999996 0.157751 -1.204664 1.150288 999997 -2.174867 -0.578116 0.647010 999998 -0.668920 1.059817 -2.091019 999999 -0.263830 -1.195737 -0.571498 [10000 rows x 3 columns]
Do it very fast
In [24]: %timeit read_csv("file.csv", nrows=10000, skiprows=990001, header=None, index_col=0) 1 loop, best of 3: 262 ms per loop
Pretty cheap to determine a-priori file length
In [25]: %timeit sum(1 for l in open('file.csv')) 10 loops, best of 3: 104 ms per loop
Reading in title
In [26]: df.columns = read_csv('file.csv', header=0, nrows=1, index_col=0).columns In [27]: df Out[27]: abc 0 990000 -0.902507 -0.274718 1.155361 990001 -0.591442 -0.318853 -0.089092 990002 -1.461444 -0.070372 0.946964 990003 0.608169 -0.076891 0.431654 990004 1.149982 0.661430 0.456155 ... ... ... ... 999995 0.057719 0.370591 0.081722 999996 0.157751 -1.204664 1.150288 999997 -2.174867 -0.578116 0.647010 999998 -0.668920 1.059817 -2.091019 999999 -0.263830 -1.195737 -0.571498 [10000 rows x 3 columns]
source share