Does erlang really implement write-modify and-modify in any smart way?

Given:

-record(foo, {a, b, c}). 

I am doing something like this:

 Thing = #foo{a={1,2}, b={3,4}, c={5,6}}, Thing1 = Thing#foo{a={7,8}}. 

From the semantic representation, Thing and Thing1 are unique objects. However, in terms of language implementation, a full copy of Thing to create Thing1 will be very wasteful. For example, if the record was a megabyte in size, and I made a thousand "copies", each of which changed a couple of bytes, I just burned a gigabyte. If the internal structure tracked the presentation of the parent structure and each derivative labeled by this parent in a way that indicated its own change, but retained all versions of elses, the derivatives could be created with minimal memory overhead.

My question is that erlang is doing something smart - internally - to save the overhead of a regular erlang doodle;

 Thing = #ridiculously_large_record, Thing1 = make_modified_copy(Thing), Thing2 = make_modified_copy(Thing1), Thing3 = make_modified_copy(Thing2), Thing4 = make_modified_copy(Thing3), Thing5 = make_modified_copy(Thing4) 

... to a minimum?

I ask because there would be a few changes in the way I cross-talked, if that was the case.

+4
source share
4 answers

The exact work of garbage collection and memory allocation is known to only a few. Fortunately, they are very happy to share their knowledge, and the following is based on what I learned from the erlang-questions mailing list and discussing with OTP developers.

When exchanging messages between processes, the content is always copied because there is no common heap between processes. The only exception is binaries larger than 64 bytes, where only the link is copied.

When executing code in a single process, only parts are updated. Let it analyze the tuples, as this is an example that you provided.

A tuple is actually a structure that stores references to actual data somewhere on the heap (with the exception of small integers and, possibly, another type of data that I cannot remember). When you update a tuple using, for example, setelement/3 , a new tuple is created with the replaced element, however, for all other elements, only the link is copied. There is one exception that I could never use.

The garbage collector monitors each tuple and understands when it is safe to recover any tuple that is no longer in use. Perhaps the data bound by the tuple is still in use, in which case the data itself is not collected.

As always, Erlang gives you some tools to understand exactly what is going on. The performance guide details how to use erts_debug:size/1 and erts_debug:flat_size/1 to understand the size of the data structure when used inside the process and when copying. Tracing Tools also let you know when, what, and how much garbage was collected.

+9
source

The notation foo has a sharpness of 4 (four words), but the whole structure has 14 words. Any immediate ones (pids, ports, small integers, atom, catch, and nil) can be stored directly in the tuple array. Any other term that cannot fit into the word, for example, other tuples, is not stored directly, but refers to boxed pointers (a boxed pointer is an erlang term with a forwarding address to the real eterm ... only internal).

In your case, a new tuple of the same arity and the atom foo , and all pointers are copied from the previous tuple, except for the second index, a , which points to a new set of {7,8} , which is 3 words. In all 5 + 3 new words are created on the heap, and only 3 words are copied from the old tuple, the remaining 9 words are not affected.

Excessively large tuples are not recommended. When updating a tuple, the entire tuple, i.e. An array, not deep content, needs to be copied and then updated in another to maintain a consistent data structure. It will also create increased garbage, causing the garbage collector to heat up, which also degrades performance. For this reason, the dict and array modules avoid using large tuples and instead have a shallow tuple.

+5
source

I can definitely verify that people have already indicated:

  • a record is just a tuple with the name of the record as the first element and with all fields only the next element of the tuple.
  • when the tuple element changes, updating the field in the record in your case, only the top-level tuple is new, all elements are simply reused

This only works because we have immutable data . Therefore, in your example, every time you update the value in the #foo record, none of the data in the elements is copied and only a new 4-element tuple (5 words) is created. Erlang will never make a deep copy in this type of operation or when passing arguments in a function call.

+2
source

Finally:

 Thing = #foo{a={1,2}, b={3,4}, c={5,6}}, Thing1 = Thing#foo{a={7,8}}. 

Here, if Thing not used again, it is likely to be updated in-place, and copying the tuple will be prevented, as said in the performance manual. (the tuple and syntax of the record, it seems to me, obeys something like an elementary binding)

 Thing = #ridiculously_large_record, Thing1 = make_modified_copy(Thing), Thing2 = make_modified_copy(Thing1), ... 

Here the tuples are actually copied every time.

I suppose it is theoretically possible to make an interesting optimization. If the compiler could perform an evacuation analysis on the return value of make_modified_copy and find that the only reference to it, the one that was returned, can store this information about the function. When he encounters a call, this function will know that it is safe to change the return value in place.

This could only be done for inter-module calls due to the code replacement feature.

Maybe one day we will get it.

0
source

Source: https://habr.com/ru/post/1369195/


All Articles