', 'FATAL

How to count duplicate elements in a Ruby array

I have a sorted array:

[ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ] 

I would like to get something like this, but it should not be a hash:

 [ {:error => 'FATAL <error title="Request timed out.">', :count => 2}, {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1} ] 
+65
arrays ruby
Feb 20 '09 at 14:17
source share
12 answers

The following code prints what you requested. I will let you decide how to actually use the hash you are looking for to generate:

 # sample array a=["aa","bb","cc","bb","bb","cc"] # make the hash default to 0 so that += will work correctly b = Hash.new(0) # iterate over the array, counting duplicate entries a.each do |v| b[v] += 1 end b.each do |k, v| puts "#{k} appears #{v} times" end 

Note. I just noticed that you said that the array is already sorted. The above code does not require sorting. Using this property can lead to faster code.

+124
Feb 20 '09 at 14:39
source share

You can do this very succinctly (one line) using inject :

 a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient ...">'] b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h } b.to_a.each {|error,count| puts "#{count}: #{error}" } 

Will produce:

 1: FATAL <error title="There is insufficient ..."> 2: FATAL <error title="Request timed out."> 
+68
Feb 21 '09 at 2:17
source share

If you have an array like this:

 words = ["aa","bb","cc","bb","bb","cc"] 

where you need to count duplicate elements, a single line solution:

 result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 } 
+29
May 28 '14 at 11:35
source share

Another approach to the answers above using Enumerable # group_by .

 [1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h # {1=>1, 2=>2, 3=>3, 4=>1} 

Breaking this into various method calls:

 a = [1, 2, 2, 3, 3, 3, 4] a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]} a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]] a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1} 

Enumerable#group_by was added in Ruby 1.8.7.

+16
Jan 24 '17 at 18:04
source share

How about the following:

 things = [1, 2, 2, 3, 3, 3, 4] things.uniq.map{|t| [t,things.count(t)]}.to_h 

It looks cleaner and more clearly describes what we are actually trying to do.

I suspect this will also work better with large collections than those that iterate over each value.

Performance Performance Test:

 a = (1...1000000).map { rand(100)} user system total real inject 7.670000 0.010000 7.680000 ( 7.985289) array count 0.040000 0.000000 0.040000 ( 0.036650) each_with_object 0.210000 0.000000 0.210000 ( 0.214731) group_by 0.220000 0.000000 0.220000 ( 0.218581) 

So it's a little faster.

+14
Apr 05 '17 at 13:42 on
source share

Personally, I would do it as follows:

 # myprogram.rb a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">'] puts a 

Then run the program and move it to uniq -c:

 ruby myprogram.rb | uniq -c 

Output:

  2 FATAL <error title="Request timed out."> 1 FATAL <error title="There is insufficient system memory to run this query."> 
+8
May 03 '12 at 10:03 PM
source share

From Ruby> = 2.2 you can use itself : array.group_by(&:itself).transform_values(&:count)

With some details:

 array = [ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ]; array.group_by(&:itself).transform_values(&:count) => { "FATAL <error title=\"Request timed out.\">"=>2, "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 } 
+7
Sep 24 '18 at 20:54
source share
 a = [1,1,1,2,2,3] a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } } => [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}] 
+3
Feb 20 '09 at 14:56
source share

If you want to use this often, I suggest doing this:

 # lib/core_extensions/array/duplicates_counter module CoreExtensions module Array module DuplicatesCounter def count_duplicates self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h end end end end 

Download it with

 Array.include CoreExtensions::Array::DuplicatesCounter 

And then use from anywhere only:

 the_ar = %w(aaaaaaa chao chao chao hola hola mundo hola chao cachacho hola) the_ar.duplicates_counter { "a" => 7, "chao" => 4, "hola" => 4, "mundo" => 1, "cachacho" => 1 } 
+1
Jul 28 '18 at 3:39
source share

Simple implementation:

 (errors_hash = {}).default = 0 array_of_errors.each { |error| errors_hash[error] += 1 } 
0
Feb 21 '09 at 2:24
source share

Here is an array of samples:

 a=["aa","bb","cc","bb","bb","cc"] 
  • Select all unique keys.
  • For each key, we accumulate them in a hash to get something like this: {'bb' => ['bb', 'bb']}
     res = a.uniq.inject ({}) {| accu, uni |  accu.merge ({uni => a.select {| i | i == uni}})}
     {"aa" => ["aa"], "bb" => ["bb", "bb", "bb"], "cc" => ["cc", "cc"]}

Now you can do things like:

 res['aa'].size 
0
Nov 13 '12 at 18:14
source share

In versions of Ruby> = 2.7 there will be Enumerable # tally .

eg:

 ["a", "b", "c", "b"].tally # => {"a"=>1, "b"=>2, "c"=>1} 
0
Sep 01 '19 at 21:06 on
source share



All Articles