How to do Lazy Map deserialization in Haskell

Similar to this question @Gabriel Gonzalez: How to make fast data deserialization in Haskell

I have a big map full of integers and text, which I serialize using Cerial. The file is about 10 m.

Every time I run my program, I deserialize it all so that I can find a few elements. Deserialization takes about 500 ms, which doesn't really matter, but I always like to profile on Friday.

It seems wasteful to always deserialize from 100k to 1M elements when I only need a few of them.

I tried decodeLazy and also changed the map to Data.Map.Lazy (not quite understanding how Map could be Lazy, but fine, there), and this does not affect the time, except maybe a little slower.

I am wondering if there is something that can be a little smarter, just download and decrypt what is needed. Of course, a database such as sqlite can be very large, but it only loads what is needed to complete the query. I would like to find something similar, but without creating a database schema.

Update

Do you know what would be great? Some merger of Mongo with Sqlite. Just as you might have a database of JSON documents using file storage ... and of course, someone did this https://github.com/hamiltop/MongoLiteDB ... in Rubin: (

Thought mmap can help. Tried mmap library and surpassed GHCI for the first time. I don’t know how to even report this error.

The bytestring bytestring-mmap library was bytestring-mmap , and it works, but not improved. Just replace this:

 ser <- BL.readFile cacheFile 

Wherein:

 ser <- unsafeMMapFile cacheFile 

Update 2

keyvaluehash can only be a ticket. Performance seems really good. But the API is strange, the documentation is missing, so it will take some time.

Update 3: I'm an idiot

Clearly, I want a non-lazy deserialization of the Map here. I need a key database and there are several options available such as dvm, tokyo-cabinet and this DB level that I have never seen before.

Keyvaluehash looks like a base database with a Haskell key, which I like, but I still don't know about quality. For example, you cannot query the database for a list of all keys or all values ​​(the only valid operations are readKey , writeKey and deleteKey ), so if you need it, then you need to store them in another place. Another disadvantage is that you must specify the size when creating the database. I used a size of 20M, so I would have enough space, but the actual created database takes up 266M. I don’t know why, because there is no documentation.

+5
source share
1 answer

One way to do this in the past is to simply create a directory in which each file is called a serialized key. You can use unsafeinterleaveIO to "thunk" the deserialized contents of each read file, so the values ​​are only forcibly read ...

+1
source

Source: https://habr.com/ru/post/1205481/


All Articles