How to count duplicate elements in a Ruby array
I have a sorted array:
[ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ] I would like to get something like this, but it should not be a hash:
[ {:error => 'FATAL <error title="Request timed out.">', :count => 2}, {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1} ] The following code prints what you requested. I will let you decide how to actually use the hash you are looking for to generate:
# sample array a=["aa","bb","cc","bb","bb","cc"] # make the hash default to 0 so that += will work correctly b = Hash.new(0) # iterate over the array, counting duplicate entries a.each do |v| b[v] += 1 end b.each do |k, v| puts "#{k} appears #{v} times" end Note. I just noticed that you said that the array is already sorted. The above code does not require sorting. Using this property can lead to faster code.
You can do this very succinctly (one line) using inject :
a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient ...">'] b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h } b.to_a.each {|error,count| puts "#{count}: #{error}" } Will produce:
1: FATAL <error title="There is insufficient ..."> 2: FATAL <error title="Request timed out."> If you have an array like this:
words = ["aa","bb","cc","bb","bb","cc"] where you need to count duplicate elements, a single line solution:
result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 } Another approach to the answers above using Enumerable # group_by .
[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h # {1=>1, 2=>2, 3=>3, 4=>1} Breaking this into various method calls:
a = [1, 2, 2, 3, 3, 3, 4] a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]} a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]] a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1} Enumerable#group_by was added in Ruby 1.8.7.
How about the following:
things = [1, 2, 2, 3, 3, 3, 4] things.uniq.map{|t| [t,things.count(t)]}.to_h It looks cleaner and more clearly describes what we are actually trying to do.
I suspect this will also work better with large collections than those that iterate over each value.
Performance Performance Test:
a = (1...1000000).map { rand(100)} user system total real inject 7.670000 0.010000 7.680000 ( 7.985289) array count 0.040000 0.000000 0.040000 ( 0.036650) each_with_object 0.210000 0.000000 0.210000 ( 0.214731) group_by 0.220000 0.000000 0.220000 ( 0.218581) So it's a little faster.
Personally, I would do it as follows:
# myprogram.rb a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">'] puts a Then run the program and move it to uniq -c:
ruby myprogram.rb | uniq -c Output:
2 FATAL <error title="Request timed out."> 1 FATAL <error title="There is insufficient system memory to run this query."> From Ruby> = 2.2 you can use itself : array.group_by(&:itself).transform_values(&:count)
With some details:
array = [ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ]; array.group_by(&:itself).transform_values(&:count) => { "FATAL <error title=\"Request timed out.\">"=>2, "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 } a = [1,1,1,2,2,3] a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } } => [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}] If you want to use this often, I suggest doing this:
# lib/core_extensions/array/duplicates_counter module CoreExtensions module Array module DuplicatesCounter def count_duplicates self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h end end end end Download it with
Array.include CoreExtensions::Array::DuplicatesCounter And then use from anywhere only:
the_ar = %w(aaaaaaa chao chao chao hola hola mundo hola chao cachacho hola) the_ar.duplicates_counter { "a" => 7, "chao" => 4, "hola" => 4, "mundo" => 1, "cachacho" => 1 } Simple implementation:
(errors_hash = {}).default = 0 array_of_errors.each { |error| errors_hash[error] += 1 } Here is an array of samples:
a=["aa","bb","cc","bb","bb","cc"] - Select all unique keys.
- For each key, we accumulate them in a hash to get something like this:
{'bb' => ['bb', 'bb']}
res = a.uniq.inject ({}) {| accu, uni | accu.merge ({uni => a.select {| i | i == uni}})}
{"aa" => ["aa"], "bb" => ["bb", "bb", "bb"], "cc" => ["cc", "cc"]}
Now you can do things like:
res['aa'].size In versions of Ruby> = 2.7 there will be Enumerable # tally .
eg:
["a", "b", "c", "b"].tally # => {"a"=>1, "b"=>2, "c"=>1}