I tested some tests for a while to see what would be a good way to load a text file. The fastest of them is to read in blocks of text, and then iterate over them using String.lines.
Reading a text file that is 188 593 869 bytes as the baseline:
IO.foreach(ARGV.shift) do |li| print li end time ruby test.rb root.mbox > /dev/null
I remove it / dev / null to remove the I / O screen from synchronization.
Instead of reading only one at a time, load it into a large chunk, then iterate over the lines:
File.read(ARGV.shift).lines do |l| print l end time ruby test.rb root.mbox > /dev/null real 0m3.492s user 0m3.281s sys 0m0.209s
This is a 0.5 second saving. It also sucked in 188 MB of data, which is unlikely to scale well if you have large files. The best part is that you can say to download the entire file I made using read() or tell it to limit the size of the read.
Here is the crossed out output from wc for the text file for your reference:
lines: 2,465,369 words: 26,466,463 bytes: 188,593,869
source share