How arrays sets and sorted sets are implemented under the hood in Ruby

Question

How arrays sets and sorted sets are implemented under the hood in Ruby

Typically, arrays are implemented as chunks of memory, installed as hash cards and sorted sets in the form of skip lists. Is this true in Ruby?

I am trying to evaluate the use of various containers in Ruby in terms of performance and memory size.

+5

ruby

Eugenemi Nov 05 '14 at 17:45

source share

2 answers

The previous answer did not actually reflect the performance of SortedSet.

Array - O (1) for adding, deleting and accessing an element by index
Hash (uses a hash table) - O (1) (more than 1, of course, than arrays) to add, delete, and access using the key
Install - Hash implemented the same
SortedSet:

So, after reading the code and testing manually using a class that counts each method of the comparison method, here is the result. It tries to load the "rbtree" module and uses regular dialing as a backup.

So, 2 options:

RBTree is available, everything is fast and, as expected, O (logN) for each addition, O (1) the first time O (N) is received at soeted_set.to_a.
RBTree is not available. It uses Set (Hash underneath) and reorders if it is dirty when reading.

From Ruby 2.0.0 set.rb:

 def to_a (@keys = @hash.keys).sort! unless @keys @keys end

The meaning of this code is:

 X.times{ sorted_set << num; sorted_set.first }

will be O (XNlog (N)) since ordering is an approximate Nlog (N).

So basically using SortedSet without guaranteeing that RBTree is available is confusing and inefficient, as it does not use the fact that it was sorted (binary search) on insertion. Therefore, in this case, using the usual Set and array methods can be faster.

The same example using a regular Set:

 X.times{ set << num; set.min }

will be O (xn)

On the bottom line, if you need the SortedSet function, also add 'rbtree' to the Gemfile.

+2

Kaplan ilya Oct 22 '15 at 13:37

source share

Jörg W Mittag · Accepted Answer · 2014-11-05T19:06:22+0000

Arrays are part of the Ruby core library. Each Ruby implementation has its own array implementation. The Ruby language specification only indicates the behavior of Ruby arrays; it does not specify any specific implementation strategy. It does not even indicate any performance limitations that could force or at least offer a specific implementation strategy.

However, most rubists have some expectation of performance characteristics of arrays that would force an implementation that does not meet them into the unknown, because no one actually used it:

Inserting, adding or adding, as well as deleting an element has the worst complexity of phasing O (n) and amortization of the worst stage - complexity O (1)
Accessing an element has worse O step complexity (1)
iterating over all elements has the worst degree of complexity O (n)

This means that arrays must be implemented as dynamic arrays with exponential resizing, otherwise these performance guarantees cannot be fulfilled. You can leave with a very wide and small tree, but an AFAIK implementation without Ruby does this.

Here's an implementation of the Rubinius array , which I personally find the easiest to read for all Ruby implementations. (Note: in C ++, only the basic requirements are implemented, most of the array methods are implemented in Ruby, for example, in kernel/common/array.rb ).

Set and SortedSet are part of the Set library in stdlib. Normally stdlib does not change between Ruby implementations (at least the parts that are actually written in Ruby, obviously the parts written in other languages cannot be separated), and since Set written in Ruby, you can expect that it will be the same in all Ruby implementations.

Set is implemented as a Hash , where only keys are used, the values are always always true : see Set#add in lib/set.rb

SortedSet supported by a red-black tree that is not implemented in Ruby.

How arrays sets and sorted sets are implemented under the hood in Ruby

More articles: