How to count duplicate elements in a Ruby array

Question

How to count duplicate elements in a Ruby array

I have a sorted array:

[ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ]

I would like to get something like this, but it should not be a hash:

 [ {:error => 'FATAL <error title="Request timed out.">', :count => 2}, {:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1} ]

+65

arrays ruby

Željko Filipin Feb 20 '09 at 14:17

source share

12 answers

You can do this very succinctly (one line) using inject :

 a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient ...">'] b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h } b.to_a.each {|error,count| puts "#{count}: #{error}" }

Will produce:

 1: FATAL <error title="There is insufficient ..."> 2: FATAL <error title="Request timed out.">

+68

vladr Feb 21 '09 at 2:17

source share

If you have an array like this:

 words = ["aa","bb","cc","bb","bb","cc"]

where you need to count duplicate elements, a single line solution:

 result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }

+29

Manish Shrivastava May 28 '14 at 11:35

source share

Another approach to the answers above using Enumerable # group_by .

 [1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

Breaking this into various method calls:

 a = [1, 2, 2, 3, 3, 3, 4] a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]} a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]] a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}

Enumerable#group_by was added in Ruby 1.8.7.

+16

Kaoru Jan 24 '17 at 18:04

source share

How about the following:

 things = [1, 2, 2, 3, 3, 3, 4] things.uniq.map{|t| [t,things.count(t)]}.to_h

It looks cleaner and more clearly describes what we are actually trying to do.

I suspect this will also work better with large collections than those that iterate over each value.

Performance Performance Test:

 a = (1...1000000).map { rand(100)} user system total real inject 7.670000 0.010000 7.680000 ( 7.985289) array count 0.040000 0.000000 0.040000 ( 0.036650) each_with_object 0.210000 0.000000 0.210000 ( 0.214731) group_by 0.220000 0.000000 0.220000 ( 0.218581)

So it's a little faster.

+14

Carpela Apr 05 '17 at 13:42 on

source share

Personally, I would do it as follows:

 # myprogram.rb a = ['FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">'] puts a

Then run the program and move it to uniq -c:

 ruby myprogram.rb | uniq -c

Output:

  2 FATAL <error title="Request timed out."> 1 FATAL <error title="There is insufficient system memory to run this query.">

+8

dan May 03 '12 at 10:03 PM

source share

From Ruby> = 2.2 you can use itself : array.group_by(&:itself).transform_values(&:count)

With some details:

 array = [ 'FATAL <error title="Request timed out.">', 'FATAL <error title="Request timed out.">', 'FATAL <error title="There is insufficient system memory to run this query.">' ]; array.group_by(&:itself).transform_values(&:count) => { "FATAL <error title=\"Request timed out.\">"=>2, "FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }

+7

Ana María Martínez Gómez Sep 24 '18 at 20:54

source share

 a = [1,1,1,2,2,3] a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } } => [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]

+3

Milan Novota Feb 20 '09 at 14:56

source share

If you want to use this often, I suggest doing this:

 # lib/core_extensions/array/duplicates_counter module CoreExtensions module Array module DuplicatesCounter def count_duplicates self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h end end end end

Download it with

 Array.include CoreExtensions::Array::DuplicatesCounter

And then use from anywhere only:

 the_ar = %w(aaaaaaa chao chao chao hola hola mundo hola chao cachacho hola) the_ar.duplicates_counter { "a" => 7, "chao" => 4, "hola" => 4, "mundo" => 1, "cachacho" => 1 }

+1

Arnold Roa Jul 28 '18 at 3:39

source share

Simple implementation:

 (errors_hash = {}).default = 0 array_of_errors.each { |error| errors_hash[error] += 1 }

0

Evan Senter Feb 21 '09 at 2:24

source share

Here is an array of samples:

 a=["aa","bb","cc","bb","bb","cc"]

Select all unique keys.
For each key, we accumulate them in a hash to get something like this: {'bb' => ['bb', 'bb']}

     res = a.uniq.inject ({}) {| accu, uni |  accu.merge ({uni => a.select {| i | i == uni}})}
     {"aa" => ["aa"], "bb" => ["bb", "bb", "bb"], "cc" => ["cc", "cc"]}

Now you can do things like:

 res['aa'].size

0

metakungfu Nov 13 '12 at 18:14

source share

In versions of Ruby> = 2.7 there will be Enumerable # tally .

eg:

 ["a", "b", "c", "b"].tally # => {"a"=>1, "b"=>2, "c"=>1}

0

Santhosh Sep 01 '19 at 21:06 on

source share

nimrodm · Accepted Answer · 2009-02-20 14:39

The following code prints what you requested. I will let you decide how to actually use the hash you are looking for to generate:

 # sample array a=["aa","bb","cc","bb","bb","cc"] # make the hash default to 0 so that += will work correctly b = Hash.new(0) # iterate over the array, counting duplicate entries a.each do |v| b[v] += 1 end b.each do |k, v| puts "#{k} appears #{v} times" end

Note. I just noticed that you said that the array is already sorted. The above code does not require sorting. Using this property can lead to faster code.

How to count duplicate elements in a Ruby array

More articles: