I tried the other answers listed above, but they are very far from decent solutions when working with large files - especially when one line size takes more than ~ 1/4 of the available RAM.
Both bash and awk slurp are the entire line, although this is not necessary for this problem. bash will fail if the line is too long, even if you have enough memory.
I implemented an extremely simple, rather non-optimized python script, which when testing with large files (~ 4 GB per line) does not smooth out and, of course, is a better solution than the ones provided.
If this is temporary critical production code, you can rewrite the ideas in C or improve the optimization when reading (instead of reading only one byte at a time), after testing that this is really a bottleneck.
The code assumes that newline is a newline character, which is a good guess for Unix, but YMMV on Mac OS / Windows. Ensure that the file ends with a line to ensure that the number of last line characters is not skipped.
from sys import stdin, exit counter = 0 while True: byte = stdin.buffer.read(1) counter += 1 if not byte: exit() if byte == b'\x0a': print(counter-1) counter = 0
user2875414 Feb 11 '15 at 21:08 2015-02-11 21:08
source share