The most efficient way to read input in Ruby

Question

The most efficient way to read input in Ruby

In Ruby, what is the most efficient method for reading gigantic text files? A string of order 10 ⁷ with 89 bytes / string. Is one method significantly better than the other?

+4

ruby

mbm Dec 14 '10 at 22:50

source share

1 answer

the tin man · Accepted Answer · 2010-12-15T00:20:32+0000

I tested some tests for a while to see what would be a good way to load a text file. The fastest of them is to read in blocks of text, and then iterate over them using String.lines.

Reading a text file that is 188 593 869 bytes as the baseline:

IO.foreach(ARGV.shift) do |li| print li end time ruby test.rb root.mbox > /dev/null # # real 0m3.949s # user 0m3.709s # sys 0m0.182s

I remove it / dev / null to remove the I / O screen from synchronization.

Instead of reading only one at a time, load it into a large chunk, then iterate over the lines:

 File.read(ARGV.shift).lines do |l| print l end time ruby test.rb root.mbox > /dev/null real 0m3.492s user 0m3.281s sys 0m0.209s

This is a 0.5 second saving. It also sucked in 188 MB of data, which is unlikely to scale well if you have large files. The best part is that you can say to download the entire file I made using read() or tell it to limit the size of the read.

Here is the crossed out output from wc for the text file for your reference:

 lines: 2,465,369 words: 26,466,463 bytes: 188,593,869

The most efficient way to read input in Ruby

More articles: