I am definitely new to ruby (and using 1.9.1), so any help is appreciated. Everything I learned about Ruby is related to using Google. I try to compare two arrays of hashes and because of the sizes, it goes long and flirts with out of memory. Any help would be appreciated.
I have a class (ParseCSV) with several methods (initialization, opening, comparison, strip, output). The way I am working now looks like this (and it passes the tests I wrote just using a much smaller dataset):
file1 = ParseCSV.new("some_file")
file2 = ParseCSV.new("some_other_file")
file1.open
file1.strip
file2.open
file2.compare("file1.storage")
file2.output
Now what I'm struggling with is a comparison method. Work on smaller data sets is not very important, it works fast enough. However, in this case, Im compares about 400,000 records (all are read into an hash array) against one that contains about 450,000 records. I'm trying to speed it up. Also, I cannot run the strip method in file2. Here's how I do it now:
def compare(x)
puts "Comparing and leaving behind non matching entries"
x.each do |row|
@storage.each_index do |y|
if row[@opts[:field]] == @storage[y][@opts[:field]]
@storage.delete_at(y)
end
end
end
end
Hope this makes sense. I know that this will be a slow process only because it has to repeat 400,000 lines 440,000 times each. But do you have any other ideas on how to speed it up and possibly reduce memory consumption?
source
share