Differences between Storable and Unboxed Vectors

So ... I used the unpacked vectors (from the vector package), preferably without much attention. vector-th-unbox makes instantiating for them a breeze, so why not.

Now I came across an instance where I was not able to automatically get these instances, a data type with parameters like phantom (like in Vector (s :: Nat) a , where s encodes the length).

This made me think about the differences between the Storable and Unboxed . Things I found out on my own:

  • Unboxed will store, for example, tuples as separate vectors leading to a better cache location without losing bandwidth when only one of these values ​​is required.
  • Storable will still be compiled into a simple (and probably efficient) readArray# that returns unboxed values ​​(as seen from reading the kernel).
  • Storable allows you to use a direct pointer that allows you to interact with external code. Unboxed no.
  • [edit] Storable instances are actually easier to write manually than Unbox (that is, vector and MVector )).

This in itself does not make me understand why Unboxed even exists, it seems to be of little use to it. I guess I missed something there?

+5
source share
1 answer

Cribbed from https://haskell-lang.org/library/vector

Stored and unpacked vectors store their data in an array of bytes, avoiding the pointer. This improves memory efficiency and makes better use of caches. The distinction between stored and unpacked vectors is subtle:

  • Storable vectors require data that are instances of the type Storable class . This data is stored in malloc ed memory, which is pinned (garbage collector cannot move it). This can lead to memory fragmentation, but allows data transfer via C FFI.
  • Unoccupied vectors require data that are instances of type Prim class . This data is stored in loose memory managed by the GC, which helps to avoid memory fragmentation. However, this data cannot be transmitted through C FFI.

Both the Storable and Prim classes provide a way to store the value as bytes and load the bytes into the value. The difference is what type is used bytearray.

As usual, benchmarking will be the only true measure of performance. However, as a general guide:

  • If you do not need to pass C FFI values, and you have a Prim instance, use unrecognized vectors.
  • If you have an instance of Storable , use a persistent vector.
  • Otherwise, use a nested vector.

There are other issues to consider, for example, the fact that nested vectors are instances of Functor , while stored and unpacked vectors are not.

+11
source

Source: https://habr.com/ru/post/1258604/


All Articles