From a large text file to a sparse matrix with Python

I am trying to find an efficient way to read a very large text file (about 2,000,000 lines). About 90% of these lines (actually 90%) are in three-column format and are used to store a sparse matrix.

Here is what I did. First of all, I am dealing with the first 10% of the file:

i=1
cpt=0
skip=0
finnum=0
indice=1 
vec=[]
mat=[]
for line in fileinput.input("MY_TEXT_FILE.TXT"):
if i==1:
    # skipping the first line
    skip = 1
if (finnum == 0)and(skip==0):
    # special reading operation for the first 10% (approximately)
    tline=shlex.split(line)
    ind_loc=0
    while ind_loc<len(tline):
    if (int(tline[ind_loc])!=0):
            vec.append(int(tline[ind_loc]))
        ind_loc=ind_loc+1   
if (finnum == 1)and(skip==0):
    print('finnum = 1')
    h=input()    
        break       
    if (' 0' in line):
    finnum = 1
if skip == 0:
    i=i+1
else:
    skip=0
    i=i+1
cpt=cpt+1

Then I extract the remaining 90% to the list:

matrix=[]
with open('MY_TEXT_FILE.TXT') as f:
for i in range(cpt):
    f.next()
for line in f:
    matrix.append(line)

This allows you to read a text file with low memory consumption very quickly. The disadvantage is that the matrix is a list of rows, each row of which looks something like this:

>>> matrix[23]
'           5          11  8.320234929063493E-008\n'

shlex.split, , .

?

, - , :

A=[0]*len(matrix)
B=[0]*len(matrix)
C=[0]*len(matrix)
for i in range(len(matrix)):
     line = shlex.split(matrix[i])
     A[i]=float(line[0])
     B[i]=float(line[1])
     C[i]=float(line[2])

+4
2

, , . 1 , , , . 77 Mac, , . numpy shlex 5 .

A=[0]*len(matrix)
B=[0]*len(matrix)
C=[0]*len(matrix)
for i in range(len(matrix)):
    full_array = np.fromstring(matrix[i], dtype=float, sep=" ")
    A[i]=full_array[0]
    B[i]=full_array[1]
    C[i]=full_array[2]

, , , 14 . , .

+2

, Numpy, python. , , 10 Matlab. ( ), , , numpy.loadtxt. float, , :

A, B, C = np.loadtxt('MY_TEXT_FILE.TXT', skiprows = cpt, unpack = True)

, ( dtype = (int, int, float) , , ), , .

, numpy .

+2

Source: https://habr.com/ru/post/1534841/


All Articles