I am trying to get only the first 100 lines of a csv.gz file that contains over 4 million lines in Python. I also need information about the # columns and the headers of each. How can i do this?
I looked at python: read lines from compressed text files to figure out how to open the file, but I'm struggling to figure out how to actually print the first 100 lines and get some metadata in the information in the columns.
I found this Reading the first N lines of a file in python , but not sure how to get married to open the csv.gz file and read it without saving the uncompressed csv file.
I wrote this code:
import gzip
import csv
import json
import pandas as pd
df = pd.read_csv('google-us-data.csv.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)
for i in range (100):
print df.next()
Python, . , , , , .
( - ):
Skipping line 63: expected 3 fields, saw 7
Skipping line 64: expected 3 fields, saw 7
Skipping line 65: expected 3 fields, saw 7
Skipping line 66: expected 3 fields, saw 7
Skipping line 67: expected 3 fields, saw 7
Skipping line 68: expected 3 fields, saw 7
Skipping line 69: expected 3 fields, saw 7
Skipping line 70: expected 3 fields, saw 7
Skipping line 71: expected 3 fields, saw 7
Skipping line 72: expected 3 fields, saw 7