Erlang: distributed work on an array

I am working on a project where we have an array of atoms that acts like a hash. Whenever a user connects to the server, a certain value is hashed, and this hash is used as an index to search for an element in the array and returns that element. "External forces" (which are handled by the long-running gen_server) are capable of changing this array, so I can’t just hard code it. My problem is how to "place" this array.

My first implementation was a simple gen_server that stored a copy of the array and sent it to the one who requested it. Then the process requesting it can cross it and get the index they need. This implementation also had an excessively large amount of memory, which I attributed to the fact that there were so many copies of the same array floating around.

In my current implementation, there is a central gen_server that processes the state of this array and children that handle the actual requests. When the state changes the central gene_server, updates the child elements. When a process wants to find a hash result, it sends its index number to the central gen_server, which forwards the request to one of the child elements. The child crosses his "local" list and sends the resulting atom back to the original process.

The problem with the current implementation is that it gets bogged down with high traffic. I tried to use more and more children, but I am sure that the central gen_server is a bottleneck.

Does anyone have any ideas for a better solution to my problem?

EDIT:% s / array / list / g

+4
source share
2 answers

I suggest using ETS Tables . I believe the Array method is not efficient enough. With an ETS Table created as a public backend application, any process can find an element as soon as it needs it. ETS Tables in newer erlang versions have simultaneous access.

  %% Lets create a record structure 
 %% where by the key will be a value
 %% in the array.
 %% For now, i do not know what to 
 %% put in the field: 'other' 
-record (element, {key, other}).
create_table (TableName) -> Options = [ named_table, set, public {keypos, 2}, %% coz we are using record NOT tuple {write_concurrency, true} ], case ets: new (TableName, Options) of TableName -> {success, true}; Error -> {error, Error} end.
lookup_by_hash (TableName, HashValue) -> try ets: lookup (TableName, HashValue) of Value -> {value, Value}; catch X: Y -> {error, {X, Y}} end.
With this arrangement, you will avoid A Single Point of Failure arising from a single gen_server storage file. This data is necessary for many processes and therefore should not be supported by a single process. That the table is available to any process at any time, as soon as it should look up.

The values ​​in the array must be converted to form entries as element , and then inserted into ETS Tables .

The advantages of this approach

1. We can create as many ETS Tables as possible 2. An ETS table can process many more elements than a data structure such as a list or array with much less comparable memory capacity.
3. ETS Tables can simultaneously access any process that is within reach, and therefore you do not need a central process or server to process data
4. One process or gen_server that stores this data means that if it is compromised (decreases due to a full mailbox), it will be unavailable, so for processes that need an array, you will have to wait until this one server restarts, or I do not know....
5. Accessing the array data by sending request messages plus creating copies of the same array for each process that it needs is not an “erlangic” design.
6. Finally, ownership of ETS Tables can be transferred from process to process. When the ownership process collapses (only gen_servers can detect that they are dying [note this]), it can transfer the ETS Table another process to gain the upper hand. Check here: ETS Give Away

What I think.
+6
source

Not sure if this helps, but could you manage a central hash value in a distributed hash table (regardless of your hash business), like any other value? Thus, a multiple process can take on a load instead of a single central process.

From what I read, an array seems to really not have to be an array.

+1
source

Source: https://habr.com/ru/post/907400/


All Articles