Reading huge .csv files using matlab file is not very good

I have several .csv files that I read using matlab using textscan , beause csvread and xlsread . does not support this 200Mb-600Mb file size.

I use this line to read it:

C = textscan(fileID,'%s%d%s%f%f%d%d%d%d%d%d%d','delimiter',',');

The problem that I found is that sometimes the data is not in this format, and then textscan stops reading on this line without any errors.

So, I did it to read it that way

C = textscan(fileID,'%s%d%s%f%f%s%s%s%s%s%s%s%s%s%s%s','delimiter',',');

Thus, I see that in 2 lines of 3 million there is a change in format.

I want to read all lines except bad / different lines. In addition, if it is possible to read only lines in which the first line is "PAA" . is it possible?

I tried loading it directly into matlab, but its super slow and sometimes it gets stuck. Or for a really big one, it will declare a memory problem.

Any recommendations?

+4
source share
2 answers

For large files that are still small enough to fit your memory, parsing all the lines at once is the best choice.

f = fopen('data.txt');             
g = textscan(f,'%s','delimiter','\n');
fclose(f);

In the next step, you need to identify the lines starting with PAAusing strncmp.

, , . , .

+3

Matlab , . grep/ bash/cmd, Matlab, Linux :

awk '{if (p ~ /^PAA/ && $1 ~ /^PAA/) print; p=$1}' yourfile.csv > yourNewFile.csv   %// This will give you a new file with all the lines that starts with PAA (NOTE: Case sensitive)

, , :

awk -F ','  'NF = 12 {print NR, $0} ' yourfile.csv > yourNewFile.csv

12 12 ",".

0

Source: https://habr.com/ru/post/1619373/


All Articles