Pandas: how to get the status of lines read when using read_csv?

I am downloading a very large csv file, for example 10 million records, using the pandas and read_csv , and I wanted to know if there is a way to show the progress of this download, for example:

 100,000 lines read 150,000 lines read 

Thanks.

+5
source share
1 answer

To show progress like this:

 Completed 1 % Completed 2 % ... Completed 99 % Completed 100 % 

you can try the following:

 import os, pandas filename = "VeryLong.csv" lines_number = sum(1 for line in open(filename)) lines_in_chunk = 500 # I don't know what size is better counter = 0 completed = 0 reader = pandas.read_csv(filename, chunksize=lines_in_chunk) for chunk in reader: # < ... reading the chunk somehow... > # showing progress: counter += lines_in_chunk new_completed = int(round(float(counter)/lines_number * 100)) if (new_completed > completed): completed = new_completed print "Completed", completed, "%" 
+2
source

Source: https://habr.com/ru/post/1266120/


All Articles