Problem reading csv file in python

I am trying to read a very simple, but somehow large (800 MB) csv file using the csv library in python. The separator is a single tab, and each line consists of some numbers. Each line is a record, and I have 20681 lines in my file. I had some problems during my calculations using this file, it always stopped on a specific line. I got a suspicion about the number of lines in the file. I used the code below to count the number of lines in this file:

tfdf_Reader = csv.reader(open('v2-host_tfdf_en.txt'),delimiter=' ')
c = 0
for row in tfdf_Reader:
  c = c + 1
print c

To my surprise, c is printed with a value of 61722 !!! Why is this happening? What am I doing wrong?

+3
source share
2 answers

800 20681 , 38 .. ? ? , 20681 ? 800 ?

61722 3 20681 - 3 , . 3 ?

, , , . Python repr() - .

Windows? , open(filename, 'rb').

, , delimeter=" " (, , ). delimiter="\t".

, :

DEBUG = True
f = open('v2-host_tfdf_en.txt', 'rb')
if DEBUG:
    rawdata = f.read(200)
    f.seek(0)
    print 'rawdata', repr(rawdata)
    # what is the delimiter between fields? between rows?
tfdf_Reader = csv.reader(f,delimiter=' ')
c = 0
for row in tfdf_Reader:
    c = c + 1
    if DEBUG and c <= 10:
        print "row", c, repr(row)
        # Are you getting rows like you expect?
print "rowcount", c

: Error: field larger than field limit (131072), , 128 .

, :

(a) - ; TEXT. , , , .

(b) (, ), , . , ( Notepad ++, View/Show Symbol/Show all characters). , csv, - :

f = open('v2-host_tfdf_en.txt', 'r') # NOT 'rb'
rows = [line.split() for line in f]
+2

. , ? , ? (, , , , - ).

, "\ t" . , , .

, excel-tab, .

0

Source: https://habr.com/ru/post/1750380/


All Articles