How to use `getBounds' with STArray?

Question

How to use `getBounds' with STArray?

I am trying to write a Shuffle Fisher-Yates algorithm using STArray. Unlike all the other examples I found on the net, I try to avoid using my own lists. I just want to shuffle the array in place.

This is what I have:

randShuffleST arr gen = runST $ do _ <- getBounds arr return (arr, gen)

arr is STArray, and gen will be the state of a type generator (RandomGen g).

I was hoping I could rely on (MArray (STArray s) e (ST s)) instance declaration defined in MArray to be able to use MArray getBounds , but GHCi cannot randShuffleST type randShuffleST . He does not work:

 Could not deduce (MArray ae (ST s)) arising from a use of `getBounds' from the context (Ix i) bound by the inferred type of randShuffleST :: Ix i => aie -> t -> (aie, t) at CGS/Random.hs:(64,1)-(66,25) Possible fix: add (MArray ae (ST s)) to the context of a type expected by the context: ST s (aie, t) or the inferred type of randShuffleST :: Ix i => aie -> t -> (aie, t) or add an instance declaration for (MArray ae (ST s)) In a stmt of a 'do' block: _ <- getBounds arr In the second argument of `($)', namely `do { _ <- getBounds arr; return (arr, gen) }' In the expression: runST $ do { _ <- getBounds arr; return (arr, gen) }

I wonder if I remove the `runST 'call like this:

 randShuffleST arr gen = do _ <- getBounds arr return (arr, gen)

it compiles fine, with a signature like

 randShuffleST :: (Ix i, MArray aem) => aie -> t -> m (aie, t)

. I am using GHC 7.4.2 on Arch Linux.

Please give explicit type signatures in your answers to help me understand your code, thanks.

EDIT: I really like Antal SZ's answer, but I can't select it because, to be honest, I don't quite understand it. Perhaps when I better understand my problem, I will answer my question in the future ... thanks.

+4

arrays random haskell shuffle

Linus arver Oct 11 '12 at 19:50

source share

3 answers

You should probably not use runST in your function. runST should be used once, outside of some computations that use mutation internally but have a clean interface. You probably need a shuffle function that moves the array in place to have a type of type STArray sie -> ST s () (or perhaps a more general type), and then have another function that uses runST to present a clean interface if you want (this function probably should have copied the values, though). In general, the purpose of ST is that STRef and STArray never leave one runST call and be used in another.

The type deduced for your function without runST is good, just more polymorphic (it will work for IO arrays, ST arrays, STM arrays, unpacked arrays, etc.). You will have an easier time with output errors if you specify explicit type signatures.

+7

shachaf Oct 11 '12 at 21:13

source share

This is because the runST rank 2 runST prevents you from giving a meaningful randShuffleST type. (There is another problem with your code as it is written: mutable ST arrays cannot essentially exist outside the ST monad, so returning from inside runST not possible, and building one to go into a pure function is unlikely at best. This is "uninteresting", but it may turn out to be confusing in itself, see the bottom of this answer on how to solve it.)

So let's see why you cannot write a type signature. It is worth saying that I agree with shachaf about how best to write functions such as the one you are writing: stay inside ST and use runST > only once, at the very end. If you do this, I included example code in the sample response that shows how to write code successfully. But I think it’s interesting to understand why you get the mistake you make; errors like the ones you get are some of the reasons why you don’t want to write your code this way!

To begin with, consider a simplified version of a function that produces the same error message:

 bounds arr = runST (getBounds arr)

Now try specifying the bounds type. The obvious choice is

 bounds :: (MArray ae (ST s), Ix i) => aie -> (i,i) bounds arr = runST (getBounds arr)

We know that arr must be a MArray , and we don't care what elements or type of index it has (as long as its indexes are in Ix ), but we know that it must live inside the ST monad. So this should work, right? Not so fast!

 ghci> :set -XFlexibleContexts +m ghci> :module + Control.Monad.ST Data.Array.ST ghci> let bounds :: (MArray ae (ST s), Ix i) => aie -> (i,i) ghci| bounds arr = runST (getBounds arr) ghci| <interactive>:8:25: Could not deduce (MArray ae (ST s1)) arising from a use of `getBounds' from the context (MArray ae (ST s), Ix i) bound by the type signature for bounds :: (MArray ae (ST s), Ix i) => aie -> (i, i) at <interactive>:7:5-38 ...

Wait a minute: Could not deduce (MArray ae (ST s1)) ? Where did this s1 come from. We do not mention such a type variable anywhere! The answer is that it comes from runST in the definition of bounds . In general, runST is of type (renaming some type variables for convenience) runST :: (forall σ. ST σ α) -> α ; when we use it here, we limited it to the type (forall σ. ST σ (i,i)) -> (i,i) . What happens here is that forall looks like a lambda (actually it is a lambda), binding σ locally inside parentheses. Therefore, when getBounds arr returns something like ST s (i,i) , we can combine α with (i,i) ---, but we cannot combine σ with s because σ isn 't. In GHC type variables for runST are s and a , not σ and α , so it renames s to s1 to remove the ambiguity, and it is a type variable, re form.

So, the error is true: we claim that for some s , MArray ae (ST s) is executed. But runST requires this to be true for every s . The error, however, is very unclear, as it introduces a new variable of the type that you cannot actually refer to (therefore, a “possible correction” is pointless, although it will never be useful in any case).

Now, the obvious question: "So can I write the correct type signature?" The answer is "... sort of." (But you probably don't want this.) The desired type would be something like this:

 ghci> :set -XConstraintKinds -XRank2Types ghci> let bounds :: (forall s. MArray ae (ST s), Ix i) => aie -> (i,i) ghci| bounds arr = runST (getBounds arr) ghci| <interactive>:170:25: Could not deduce (MArray ae (ST s)) arising from a use of `getBounds' from the context (forall s. MArray ae (ST s), Ix i) ...

This restriction says that MArray ae (ST s) is satisfied for each s , but we still get a type error. It seems that “GHC does not support polymorphic restrictions to the left of the arrow” - and in fact, while googling around trying to find this information, I found an excellent blog post in the section “Home is usually a function” that runs into that the same problem as you explains the error and provides the following workaround. (They also get an excellent "class invalid statement" error message stating clearly that such a thing is not possible, probably due to differences in GHC versions.)

The idea is that, as it usually happens, when we want more of the type class constraints that we can get from the GHC built-in system to provide explicit evidence for the existence of such a type of class (ab) using GADT:

 ghci> :set -XNoFlexibleContexts -XNoConstraintKinds ghci> -- We still need -XRank2Types, though ghci> :set -XGADTs ghci> data MArrayE aem where ghci| MArrayE :: MArray aem => MArrayE aem ghci| ghci>

Now that we have a value of type MArrayE aem , we know that the value must be constructed using the MArrayE constructor; this constructor can only be called when the MArray aem , and therefore matching patterns by MArrayE will make this constraint available again. (The only other possibility is that your value of this type was undefined, so pattern matching is actually needed.) Now we can provide this as an explicit argument to the bounds function, so we call it bounds MArrayE arr :

 ghci> :set -XScopedTypeVariables ghci> let bounds :: forall ae i. ghci| Ix i => (forall s. MArrayE ae (ST s)) -> aie -> (i,i) ghci| bounds evidence arr = runST (go evidence) ghci| where go :: MArrayE ae (ST s) -> ST s (i,i) ghci| go MArrayE = getBounds arr ghci| ghci> -- Hooray!

Note the oddity in which we have to decompose the body into its own function and match the pattern. What happens is that if you match patterns in the bounds argument list, s from evidence is fixed to a specific value too soon, and so we need to defer this; and (I think because outputting with types of higher rank is difficult), we also need to provide an explicit type for go , which requires variables of type region.

And finally, back to the source code:

 ghci> let randShuffleST :: forall aei g. Ix i => (forall s. MArrayE ae (ST s)) ghci| -> aie ghci| -> g ghci| -> (aie, g) ghci| randShuffleST evidence arr gen = runST $ go evidence ghci| where go :: MArrayE ae (ST s) -> ST s (aie,g) ghci| go MArrayE = do _ <- getBounds arr ghci| return (arr, gen) ghci| ghci> -- Hooray again! But...

Now, as I said at the beginning, there was one problem. In the above code, there will never be a way to build a value of type forall s. MArrayE ae (ST s) forall s. MArrayE ae (ST s) , because the restriction is contraint forall s. MArray ae (ST s) forall s. MArray ae (ST s) is unsatisfactory. For the same reason, you could not write randShuffleST in the source code even without the type error you receive, because you cannot write a function that returns a STArray outside of ST .

The cause of both of these problems is the same: a STArray first parameter is the state in which it lives . The MArray instance for STArray is equal to instance MArray (STArray s) e (ST s) , and therefore you will always have type types ST s (STArray sie) . Since runST :: (forall s. ST sa) -> a , running runST mySTArrayAction will cause the s to "leak" in an illegal way. Take a look

runSTArray :: Ix i => (forall s. ST s (STArray sie)) -> Array ie

and his unboxed friend

runSTUArray :: Ix i => (forall s. ST s (STUArray sie)) -> UArray ie .

You can also use

unsafeFreeze :: (Ix i, MArray aem, IArray be) => aie -> m (bie)

do the same if you promise that the last function you ever call in your mutable array; the freeze function loosens this restriction, but should copy the array. Similarly, if you want to pass an array, not a list, to a clean version of your function, you will also need

thaw :: (Ix i, IArray ae, MArray bem) => aie -> m (bie) ;

using unsafeThaw is likely to be a disaster here, as you are passing an immutable array in which you are not in control! All this together will give us something like:

 ghci> :set -XNoRank2Types -XNoGADTs ghci> -- We still need -XScopedTypeVariables for our use of `thaw` ghci> import Data.Array.IArray ghci> let randShuffleST :: forall ia ie g. (Ix i, IArray ia e) ghci| => ia ie ghci| -> g ghci| -> (Array ie, g) ghci| randShuffleST iarr gen = runST $ do ghci| marr <- thaw iarr :: ST s (STArray sie) ghci| _ <- getBounds marr ghci| iarr' <- unsafeFreeze marr ghci| return (iarr', gen) ghci| ghci> randShuffleST (listArray (0,2) "abc" :: Array Int Char) "gen" (array (0,2) [(0,'a'),(1,'b'),(2,'c')],"gen")

This is O (n) time to copy the input immutable array, but with optimization, it is O (1) time to freeze the modified array for output, since STArray and Array match under the hood.

Applying this to your problem, in particular, we have the following:

 {-# LANGUAGE FlexibleContexts #-} import System.Random import Control.Monad import Control.Applicative import Control.Monad.ST import Data.Array.ST import Data.STRef import Data.Array.IArray updateSTRef :: STRef sa -> (a -> (b,a)) -> ST sb updateSTRef rf = do (b,a) <- f <$> readSTRef r writeSTRef ra return b swapArray :: (MArray aem, Ix i) => aie -> i -> i -> m () swapArray arr ij = do temp <- readArray arr i writeArray arr i =<< readArray arr j writeArray arr j temp shuffle :: (MArray ae (ST s), Ix i, Random i, RandomGen g) => aie -> g -> ST sg shuffle arr gen = do rand <- newSTRef gen bounds@ (low,_) <- getBounds arr when (rangeSize bounds > 1) . forM_ (reverse . tail $ range bounds) $ \i -> swapArray arr i =<< updateSTRef rand (randomR (low,i)) readSTRef rand -- Two different pure wrappers -- We need to specify a specific type, so that GHC knows *which* mutable array -- to work with. This replaces our use of ScopedTypeVariables. thawToSTArray :: (Ix i, IArray ae) => aie -> ST s (STArray sie) thawToSTArray = thaw shufflePure :: (IArray ae, Ix i, Random i, RandomGen g) => aie -> g -> (aie, g) shufflePure iarr g = runST $ do marr <- thawToSTArray iarr g' <- shuffle marr g iarr' <- freeze marr return (iarr',g') shufflePure' :: (IArray ae, Ix i, Random i, RandomGen g) => aie -> g -> (Array ie, g) shufflePure' iarr g = let (g',g'') = split g iarr' = runSTArray $ do marr <- thaw iarr -- `runSTArray` fixes the type of `thaw` void $ shuffle marr g' return marr in (iarr',g'')

Again, you can replace the freeze with Data.Array.Unsafe.unsafeFreeze in shufflePure ; this is likely to lead to acceleration, since it would not have to copy the array to return it if it was Array ie . The runSTArray function safely terminates unsafeFreeze , so the problem is not shufflePure' . (These two equivalents, modulo some details about the separation of PRNG.)

What do we see here? It is important to note that only mutable code ever refers to mutable arrays, and it remains mutable (i.e. returns something inside ST s ). Since shuffle performs a random move in place, it does not need to return an array, just PRNG. To build a clean interface, we thaw immutable array into a mutable array, shuffle it into place, and then return the resulting array to freeze into an immutable one. This is important: it prevents the leakage of volatile data back into the clean world. You cannot directly shuffle the array passed because it is immutable; contrariwise, you cannot directly return a mutable shuffled array as an immutable array because it is modified, and what if someone can change it?

This does not contradict any of the errors that we saw above, because all these errors occur due to improper use of runST . If we restrict ourselves to using runST , run it only after we have collected a clean result, all internal threads can happen automatically. Since runST is the only function with a rank-2 type, this is the only place where serious type-oddity can be produced; everything else just requires your justification based on the standard type, although perhaps with a lot of thought to support the state parameter s state-thread.

And here it is:

 *Main> let arr10 = listArray (0,9) [0..9] :: Array Int Int *Main> elems arr10 [0,1,2,3,4,5,6,7,8,9] *Main> elems . fst . shufflePure arr10 <$> newStdGen [3,9,0,5,1,2,8,7,6,4] *Main> elems . fst . shufflePure arr10 <$> newStdGen [3,1,0,5,9,8,4,7,6,2] *Main> elems . fst . shufflePure' arr10 <$> newStdGen [3,9,2,6,8,4,5,0,7,1] *Main> elems . fst . shufflePure' arr10 <$> newStdGen [8,5,2,1,9,4,3,0,7,6]

Success finally! (Too long, really. Sorry, length of answer).

+6

Antal spector-zabusky Oct 12 '12 at 1:03

source share

HaskellElephant · Accepted Answer · 2012-10-11T22:45:12+0000

Below is one way to implement Fisher-Yates in place (I think it's called Durstenfeld or Knuth Shuffle). Note that runST never called, and runSTArray instead, and it is called only once.

 import Data.Array import Data.Array.ST import Control.Monad.ST import Control.Monad import System.Random fisherYates :: (RandomGen g,Ix ix, Random ix) => g -> Array ix e -> Array ix e fisherYates gen a' = runSTArray $ do a <- thaw a' (bot,top) <- getBounds a foldM (\gi -> do ai <- readArray ai let (j,g') = randomR (bot,i) g aj <- readArray aj writeArray ai aj writeArray aj ai return g') gen (range (bot,top)) return a

Note that although the algorithm runs in place, the function first copies the array specified in the input (the result of using the thaw function) before executing the algorithm on the copy. To avoid copying an array, you have at least two options:

Use unsafeThaw , which (as the name implies) is unsafe and can only be used if you are sure that the input array will never be used again. This is not a trivial guarantee due to lazy pricing.
Let fisherYates be of type (RandomGen g,Ix ix, Random ix) => g -> STArray s ix e -> ST s (STArray s ix e) and performs the whole operation that requires a local fishing-yates algorithm inside the ST monad and gives only final answer runST .

How to use `getBounds' with STArray?

More articles: