The fastest way to count the number of rows?

The easiest way to count line numbers in a file can be as follows:

while(!feof(fp))
{
  ch = fgetc(fp);
  if(ch == '\n')
  {
    lines++;
  }
}

But now the requirement is that I have to count the number of lines in large files. This will have an impact on performance.

Is there a better approach?

+4
source share
4 answers

For quick I / O, you usually want to read / write in a multiple of the block size of your file system / OS.

You can request the block size by calling statfseither fstatfsin your file or file descriptor (read the manual pages).

struct statfshas a field f_bsize, and sometimes f_iosize:

optimal transfer block size

f_bsize POSIX, AFAIK. Mac OS X iOS f_iosize, ( f_bsize Mac OS X/iOS , f_iosize, IIRC).

struct statfs fsInfo = {0};
int fd = fileno(fp); // Get file descriptor from FILE*.
long optimalSize;

if (fstatfs(fd, &fsInfo) == -1) {
    // Querying failed! Fall back to a sane value, for example 8kB or 4MB.
    optimalSize = 4 * 1024 * 1024;
} else {
    optimalSize = fsInfo.f_bsize;
}

( read fread) . . EOF.

- , : mmap . , , , ​​ , , " " , ' ve, , .

+5

" ?"

!feof(fp) .

while ((ch = fgetc(fp)) != EOF)

, ( , ) .

: http://faq.cprogramming.com/cgi-bin/smartfaq.cgi?answer=1046476070&id=1043284351

+3

I would recommend using an IO with a memory card display to allow the OS to optimize the IO drive (probably your biggest bottleneck), while you are just counting the lines. Also consider that a string can be indicated by any of four possibilities: \ r, \ n, \ r \ n, end of file.

+1
source

If the file does not contain a header with metadata, such as line numbers, finding this number is linear in complexity. Also keep in mind that "\ n" is not a universal newline character.

0
source

Source: https://habr.com/ru/post/1545985/


All Articles