Can volatile data structures be used purely functionally?

This is a general question about functional programming, but I'm also interested in what answer it has in certain languages.

I only have a basic knowledge of functional languages, so bear with me.

I understand that functional languages ​​focus on different data structures than on imperative languages ​​because they like immutability: persistent data structures.

For example, all of them have the concept of an immutable list, in which you can create new lists x :: l and y :: l from an existing list l and two new elements x and y without all elements l need to be copied. This is likely implemented by a new list object that internally points to the old one as a tail.

In imperative languages, such a data structure is rarely used because they do not provide good locality of links like c-style arrays.

In general, the search for data structures that support a functional style is his own ambition, so it would be great if it were not always so.

Now here is the idea of ​​how you can use all the classic data structures in functional programming, if you have the right language support for this.

In general, a data structure in an imperative language has modifying operations defined on it (in pseudocode):

 data.modify(someArgument) 

Functional Recording Method:

 newData = modified(data, someArgument) 

The common problem is that this usually requires copying the data structure, except that the language can know that data will not actually be used by anyone else: then the modification can be made as a mutation of the original and no one could tell the difference.

There is a large class of cases where the language can output this property “never been used elsewhere”: when the first argument modified is an unrelated value, as in this example:

 newData = modified(modified(data, someArgument)) 

Here, data can be used elsewhere, but modified(data, someArgument) clearly not.

This is what is called “rvalue” in C ++, and in the latest incarnation of C ++, which, ironically, doesn’t work at all, you can overload these rvalues.

For example, you can write:

 Data modified(Data const& data) { // returns a modified copy } Data modified(Data && data) { // returns the modified original } 

This means that in C ++ you can actually use any mutable efficient data structure and convert it to an immutable api, which can be used in a purely functional way as efficiently as the imperative version.

(There is a warning that casting is still sometimes performed in C ++, you must force the rvalue to be overloaded. And, of course, you must take care of the implementation of such data structures, that is, when using rvalue overloads ..)

Now my question is:

Do valid functional languages ​​have a similar mechanism? Or is it not needed for some other reason?

(I noted some specific languages ​​that interest me especially.)

+5
source share
4 answers

I am sure that a function such as parsing aliases (checking to see if data is being used elsewhere) is not part of the Scala compiler (and is not part of other FP languages ​​like Haskell and Clojure). The Scala collection API (for example) is explicitly divided into immutable and mutable packages. immutable data structures use the concept of structural exchange to negate the need to copy data and, therefore, reduce the overhead (in terms of the amount of time data) working with immutable structures.

As already mentioned, methods like cons :: create a new immutable structure, which under the hood contains a link to any existing immutable data (instead of making a copy).

Conversions between mutable and immutable types (e.g. Scala) make a copy of the mutable data (usually in a lazy way), rather than using any mechanisms, such as checking if the mutable structure is not mentioned anywhere and allows it to mutate.

When I first switched from Java to Scala, I first thought that the (often) large temporary data that should be created when working with immutable structures could be a performance limitation and include some clever methods that actually allow the mutation to be allowed where it was safe, but this is not so because the idea is that immutable data never indicates younger values. Since lower values ​​do not exist at the time the old value is created, it cannot be indicated at the time of its creation, and since the values ​​never change, none of them can be specified later. The result is that FP languages ​​such as Scala / Haskell can generate all this temporary data, as garbage collectors can delete it in a very short amount of time.

In a nutshell, Scala / Haskell (I'm not sure about F #) does not allow mutations of immutable structures, since the state of a runtime, such as the current JVM, has a very efficient garbage collection, and therefore temporary data can be deleted very quickly. Of course, as I am sure, you know, an immutable structure containing mutable elements is quite possible in FP languages ​​such as Scala, but although mutable elements can be changed, an immutable container cannot be. Items cannot be added / removed.

+3
source

It is true that persistent data structures are slower than their mutable counterparts. They do not argue about this. Sometimes the difference is insignificant (iteration over a linked list compared to an array), in other cases it can be large (iteration in reverse order), but this is not so. The choice to use immutable data (or should be) conscious: we guarantee stability.

Consider this point: for most (not all) modern programs, local performance is not a concern. For today's programs, the bottleneck in real performance is parallelization - both on the local machine with shared memory, and on different machines. Since the amount of data that we process these days compresses every last bit from memory and branch prediction, it is not going to reduce it. We need scale. And guess the first source of errors in parallel programs? This right is a mutation.

Another big problem for modern programs is stability. Long gone are the days when a program could crash at you, and you just restarted it and continued to work. Today, programs should run on headless servers without human intervention for several months or years. Today, a program cannot simply throw away its digital weapons and expect a person to understand what went wrong. In this setup, local performance is much less important than stability and parallelization: it’s much cheaper to buy (or rent) another ten servers than to hire a person to restart the program from time to time.

It is true that a parallelizable and stable program can be made using mutation. It is theoretically possible. This is much more complicated. With immutable data, you must first aim at the foot.

And here, there is some perspective: we were already there. How often do you use goto in your code? Did you think why this is so? You can do all kinds of neat performance tricks with goto , and yet we don't want to do this. At some point in the history of programming, we decided that goto had more problems than it was worth. The same thing happened with raw pointers: many languages ​​do not have them at all, others carefully control them, and even in those languages ​​that have unlimited access to the source pointers, now they are considered a bad form for using them. Today we are in the middle of the next stage: first we abandoned goto , and then abandoned the original pointers, now we are slowly abandoning the mutation.

However, if you really press the local performance envelope for a legitimate reason, and you determine that immutable data is really a bottleneck (remember: measure first, then optimize), then most functional languages ​​(except Haskell and Elm) will allow you to avoid mutation, although and reluctantly. Like the original C # pointers, you can have a mutation, you just need to be explicit (and careful). For example, in F # you can have mutable variables, raw arrays, mutable records, classes, interfaces, etc. This is possible, just not recommended. And the general consensus so far is that he should use the mutation as long as it is localized (i.e., it does not flow from the outside), and you really know what you are doing, and you documented it, and tested it to death .

A common case for this is “building a value”, where you have a function that ultimately creates an unchanging value, but does all kinds of messy things. One example is how the main F # library implements List.map : usually because the lists are repeated back and forth, but built on the reverse side, you must first construct the converted list by iterating, and then undo it. Therefore, the F # compiler deceives here and mutates the list as it is built to avoid unnecessary reversal.

And one more note about "locality". Remember how I mentioned that you can do all kinds of neat performance tricks with goto ? Well, that’s not entirely true. Since programmers started writing programs without goto , binary code became more predictable because transitions are now generated by compilers rather than human-encoded ones. This allowed processors to accurately predict them and optimize processing based on these forecasts. The end result is that now you are more likely to get worse performance when using goto same way than when using accepted higher level tools like loops. On the same day, the CPUs could not afford to be smart, so choosing not to use goto was a pure measure of stability. But now it has proven really useful for performance, who would have thought?

I affirm that the same thing will happen with immutability. I do not know exactly how this will happen, but I am sure that it will be so. Even today, without special equipment, some optimization can still be done during compilation: for example, if the compiler knows that the variable is immutable, it may decide to cache it in the register for a long period or even push it to the constant as a whole. It is true that most real compilers today do not perform all of these possible optimizations (although they do perform some), but they will. We are just starting. :-)

+6
source

1) Functional programming languages ​​support persistent data structures. When a data structure is converted to another, or any operation is performed on a data structure that creates a new data structure, the immutable parts of the data structure are reused by linking, especially in the case of lists.

In computing, a constant data structure is a data structure that always retains the previous version of itself when it changes. Such data structures are virtually immutable, because their operations (explicitly) do not update the structure in place, but instead always give a new updated structure.

2) In purely lazy functional languages, computation is postponed, and evaluation is performed only when the result of the expression is used for the final value / result. This mechanism will help to avoid unnecessary calculations.

+1
source

The nun ST (state thread) in Haskell is a way to ensure that certain actions are invoked sequentially (without modification outside this sequence). In ST you can use imperative, mutable data structures in Haskell. Please note that Haskell is considered one of the few purely functional languages.

0
source

Source: https://habr.com/ru/post/1258662/


All Articles