Ruby large file compression with Zlib for gzip

I have a very large file, approx. 200 million rows of data.

I would like to compress it using the Zlib library, in particular using Writer.

Reading each line one at a time, it seems that it will take quite a while. Is there a better way to do this?

Here is what I have right now:

require 'zlib'

Zlib::GzipWriter.open('compressed_file.gz') do |gz|
 File.open(large_data_file).each do |line|
   gz.write line
 end
 gz.close
end
+4
source share
1 answer

You can use IO # reading to read a fragment of arbitrary length from a file.

require 'zlib'

Zlib::GzipWriter.open('compressed_file.gz') do |gz|
 File.open(large_data_file) do |fp|
   while chunk = fp.read(16 * 1024) do
     gz.write chunk
   end
 end
 gz.close
end

This will read the source file in 16kb blocks and add each compressed fragment to the output stream. Adjust the block size to your preference based on your environment.

+8
source

Source: https://habr.com/ru/post/1546602/


All Articles