When testing some code, to find out if using a set is really faster than an array when checking elements included via include? I found some performance anomaly regarding strings and characters inside the collection.
First up is the script that I used for benchmarking. It basically creates an array containing 50 random 50 character strings, gets a sample of 20, and checks to see if all sample values are included. The same data is used to create a set of strings, an array of characters, and a set of characters.
require 'benchmark/ips'
require 'Set'
collection_size = 50
element_length = 50
sample_size = 20
Benchmark.ips do |x|
array_of_strings = begin
(1..collection_size).map {|pos| (0..element_length).map { ('a'..'z').to_a[rand(26)] }.join }
end
array_of_symbols = array_of_strings.map(&:to_sym)
set_of_strings = Set.new(array_of_strings)
set_of_symbols = Set.new(array_of_symbols)
sample_of_strings = array_of_strings.sample(sample_size)
sample_of_symbols = array_of_symbols.sample(sample_size)
x.report("array_of_strings: #{collection_size} elements with length #{element_length}, sample size #{sample_of_strings.length}") {
sample_of_strings.each do |s|
array_of_strings.include? s
end
}
x.report("set_of_strings: #{collection_size} elements with length #{element_length}, sample size #{sample_of_strings.length}") {
sample_of_strings.each do |s|
set_of_strings.include? s
end
}
x.report("array_of_symbols: #{collection_size} elements with length #{element_length}, sample size #{sample_of_symbols.length}") {
sample_of_symbols.each do |s|
array_of_symbols.include? s
end
}
x.report("set_of_symbols: #{collection_size} elements with length #{element_length}, sample size #{sample_of_symbols.length}") {
sample_of_symbols.each do |s|
set_of_symbols.include? s
end
}
x.compare!
end
The test system is the 2011 MacBook Pro running on OSX 10.10.4, and the ruby version was installed using rvm 1.26.11.
ruby 2.2.2 :
set_of_strings: 145878.6 i/s
set_of_symbols: 100100.1 i/s - 1.46x slower
array_of_symbols: 81680.0 i/s - 1.79x slower
array_of_strings: 43545.9 i/s - 3.35x slower
, , , , . , , , , , , . script , .
, script ruby 2.1.6 :
set_of_symbols: 202362.3 i/s
set_of_strings: 145844.1 i/s - 1.39x slower
array_of_symbols: 39158.1 i/s - 5.17x slower
array_of_strings: 24687.8 i/s - 8.20x slower
, , , ruby 2.2.2, , 2.1.6 .
- . , , 2.2.2 2.1.6, . , , 2.1.6. 2.2.2!
script, . i/s 2.1.6 2.2.2, 2.2.2 .
1:
, Hash, Set. 1 1000 / Hash [k] .
Ruby 2.2.2:
h_string: 1000 keys, sample size 200: 29374.4 i/s
h_symbol: 1000 keys, sample size 200: 10604.7 i/s - 2.77x slower
Ruby 2.1.6.:
h_symbol: 1000 keys, sample size 200: 31561.9 i/s
h_string: 1000 keys, sample size 200: 25589.7 i/s - 1.23x slower
- 2.2.2 , script:
require 'benchmark/ips'
collection_size = 1000
sample_size = 200
Benchmark.ips do |x|
h_string = Hash.new
h_symbol = Hash.new
(1..collection_size).each {|k| h_string[k.to_s] = 1}
(1..collection_size).each {|k| h_symbol[k.to_s.to_sym] = 1}
sample_of_string_keys = h_string.keys.sample(sample_size)
sample_of_symbol_keys = sample_of_string_keys.map(&:to_sym)
x.report("h_string: #{collection_size} keys, sample size #{sample_of_string_keys.length}") {
sample_of_string_keys.each do |s|
h_string[s]
end
}
x.report("h_symbol: #{collection_size} keys, sample size #{sample_of_symbol_keys.length}") {
sample_of_symbol_keys.each do |s|
h_symbol[s]
end
}
x.compare!
end
2:
ruby 2.3.0dev (2015-07-26 trunk 51391) [x86_64-darwin14], , collection_size sample_size ruby 2.1.6
, 10000 100 , 2.1.6 ( 3 , 2.2.2). , , , , .
3:
comment by @cremno 2.2 2.2 , 2.1.6
- , , . , 50 -, .
- ruby 2.3 backported code, , 2.2.3, ,
- , " " "Set.include? , Array.include? '
- Symbol , .