Open csv.gz file in Python and print the first 100 lines

Question

Open csv.gz file in Python and print the first 100 lines

I am trying to get only the first 100 lines of a csv.gz file that contains over 4 million lines in Python. I also need information about the # columns and the headers of each. How can i do this?

I looked at python: read lines from compressed text files to figure out how to open the file, but I'm struggling to figure out how to actually print the first 100 lines and get some metadata in the information in the columns.

I found this Reading the first N lines of a file in python , but not sure how to get married to open the csv.gz file and read it without saving the uncompressed csv file.

I wrote this code:

import gzip
import csv
import json
import pandas as pd


df = pd.read_csv('google-us-data.csv.gz', compression='gzip', header=0,    sep=' ', quotechar='"', error_bad_lines=False)
for i in range (100):
print df.next()

Python, . , , , , .

( - ):

Skipping line 63: expected 3 fields, saw 7
Skipping line 64: expected 3 fields, saw 7
Skipping line 65: expected 3 fields, saw 7
Skipping line 66: expected 3 fields, saw 7
Skipping line 67: expected 3 fields, saw 7
Skipping line 68: expected 3 fields, saw 7
Skipping line 69: expected 3 fields, saw 7
Skipping line 70: expected 3 fields, saw 7
Skipping line 71: expected 3 fields, saw 7
Skipping line 72: expected 3 fields, saw 7

+4

python csv

SizzyNini 22 . '16 17:55

4

, , gzip.GzipFile - - , .

- csv ... csv.reader.

csv.reader , , .

100 csv, , , , 100 .

, csv, .

+1

Useless 22 . '16 18:06

Your code is OK;

pandas read_csv

warn_bad_lines : boolean, defaults to True

If error_bad_lines is False, and warn_bad_lines is True, 
a warning for each "bad line" will be output. (Only valid with C parser).

+1

Cab Sep 22 '16 at 18:21

source share

I think you could do something like this (from the gzip examples module )

import gzip
with gzip.open('/home/joe/file.txt.gz', 'rb') as f:
    header = f.readline()
    # Read lines any way you want now.

0

Stats4224 Sep 22 '16 at 18:02

source share

HEADLESS_0NE · Accepted Answer · 2016-09-22T18:25:14+0000

, , read_csv nrows, , , .

, , , error_bad_lines False. ( , warn_bad_lines False). , .

import pandas as pd
data = pd.read_csv('google-us-data.csv.gz', nrows=100, compression='gzip',
                   error_bad_lines=False)
print(data)

- csv, for, .

Open csv.gz file in Python and print the first 100 lines

More articles: