What is the best technique for creating a random access data structure lazily?

Question

What is the best technique for creating a random access data structure lazily?

In Haskell, I would like to generate a list of random integers of indefinite length. (However, less than 1 million.)

I’m unlikely to need all the elements of the list at once, so I would like to generate it lazily. However, after the generation, I will need to access the items in the random access list. So, I believe that the best method would be to copy an infinite list to an array. However, I don’t know if arrays can be "swapped" with lists very easily - for example, as soon as I want to generate more list elements, I want to start where I left off but expand the array.

In any case, maybe I should solve for some tree structure log (n) instead of an array? What do you think?

Does Haskell have a tree structure for sequential items that can be accessed using a list-like API?

+4

haskell

Steve Feb 10 '11 at 2:54

source share

5 answers

This seems to be useful, but you will need to generate larger trees as the list grows: http://www.codeproject.com/KB/recipes/persistentdatastructures.aspx#RandomAccessLists (it is a log (n) structure, although). Maybe something like http://research.janelia.org/myers/Papers/applicative.stack.pdf , but I don’t see how to do this efficiently (with a general structure) when the list is generated dynamically.

One possible approach would be to use a pseudo-random number generator that allows “slipping” or going forward steps n in sublinear time. One method (slow, but works in constant time) is to use block encryption in counter mode; see http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator#Designs_based_on_cryptographic_primitives . See http://cs.berkeley.edu/~mhoemmen/cs194/Tutorials/prng.pdf for details.

+4

Jeremiah willcock Feb 10 '11 at 3:12

source share

Not an answer, but a consideration: note that any standard technique for creating a list of random values will include threads through the seed. Therefore, if I create a lazy random infinite list and force the value of the 100th element, this forces not only the spine of the cells minus 0-99, but also their values.

In any case, I would recommend lazy persistent vectors: http://hackage.haskell.org/package/storablevector

You still have O (n) access, but you reduced it by the size factor of your chunk, so in practice it is much faster than a simple list. And you still get the modulo size, laziness properties that interest you.

+2

sclv Feb 10 '11 at 4:01

source share

What about Data.Sequence.Seq ? It is not lazy, but supports O (1) from both ends, and lookups - O (log (min (i, ni))), which should be better than most other tree structures. You can save Seq and generate / add additional outputs as needed.

There is also a ListLike instance here if you prefer this API over the one provided by Seq .

0

John l Feb 10 '11 at 11:31

source share

You can only create a list of arrays. Therefore, if each array stores the elements M your sequence, access to element n is O(n/M) . This can be a good environment.

For instance:

 import Data.Array import Data.List fibsList :: (Num a) => [a] fibsList = 1 : 1 : zipWith (+) fibsList (tail fibsList) chunkSize :: (Num a, Ix a) => a chunkSize = 100000 fibsChunks :: (Num i, Ix i, Num e) => [Array ie] fibsChunks = mkChunks fibsList where mkChunks fs = listArray (0,chunkSize-1) xs : mkChunks ys where (xs,ys) = splitAt chunkSize fs lookupFib :: (Integral i, Ix i, Num e) => i -> e lookupFib n = fibsChunks `genericIndex` q ! r where (q,r) = n `divMod` chunkSize

0

rampion Feb 10 '11 at 15:39

source share

porges · Accepted Answer · 2011-02-10T04:09:30+0000

If you have good PRNG, then the values that are generated by seeds that are close to each other should be independent, so you can just select the initial seed, then each cell i in the array will be prng(seed+i) .

This essentially just uses it as a hash function: P

If you do it this way, you don’t even need an array, you can just have the function getRandomValue seed i = prng (seed+i) . Whether this is better than an array of lazy values will depend on the complexity of your PRNG, but in any case you will get O (1) access.

What is the best technique for creating a random access data structure lazily?

More articles: