Slowdown when processing a large number of files in Ruby

I am trying to create a large array containing about 64000 objects. The objects are truncated SHA256 file digests.

Files are located in 256 subdirectories (named 00 - ff), each of which contains about 256 files (for each of them a little different). The size of each file is from 1.5 KB to 2 KB.

The code is as follows:

require 'digest' require 'cfpropertylist' A = Array.new Dir.glob('files/**') do |dir| puts "Processing dir #{dir}" Dir.glob("#{dir}/*.bin") do |file| sha256 = Digest::SHA256.file file A.push(CFPropertyList::Blob.new(sha256.digest[0..7])) end end plist = A.to_plist({:plist_format => CFPropertyList::List::FORMAT_XML, :formatted => true}) File.write('hashes.plist', plist) 

If I process 16 directories (replacing "files / **" with "files / 0 *" in the above), the time it takes for my machine is 0m0.340s.

But if I try to process them all, the processing speed will decrease sharply after processing 34 directories.

This is the latest version of OS X using ruby ​​stock. The machine is an iMac in mid-2011 with 12 GB of RAM and 3.4 GHz Intel Core i7.

The limiting factor is apparently not the size of the array: because if I delete the sha256 processing and just save the file names, there is no slowdown.

Is there anything I can do better or keep track of the problem? At the moment, I do not have another OS or machine to check if it is OS X or a specific machine.

+5
source share
1 answer

This is a disk caching / FS issue. After running the script to completion and restarting it again, the slowdown basically disappeared. Also, using another computer with an SSD did not show any slowdown.

0
source

Source: https://habr.com/ru/post/1241168/


All Articles