How atomic are HPA?

How does the GHC handle thunks accessed by multiple threads (either explicit threads or internal ones that evaluate sparks)? Could it be that multiple threads evaluate the same thunk while duplicating work? Or, if the synchronization is synchronized, how, so that the performance does not suffer?

+44
multithreading haskell lazy-evaluation ghc thunk
Aug 12 '15 at 18:30
source share
3 answers

From the blog post related @Lambdageek, GHC Comment and GHC User Guide I am collecting the following:

The GHC tries to prevent revaluation of thunks, but since true blocking between threads is expensive, and thunks are usually clean and so harmless to reevaluate, it usually does it carelessly, with little chance of duplicating work anyway.

The method that he uses to avoid work is to replace the thunks with a black hole, a special marker that tells other threads (or sometimes the thread itself, as it happens when <<loop>> detected), which is evaluated by thunk.

Given this, there are at least three options:

  • By default, it uses the "lazy black checkholding", where it is done only before the thread is about to stop. Then he “walks” on his stack and creates “true” black holes for new tricks, using a lock to ensure that each thread receives only one thread scooping it, and to abort its own evaluation if it finds another thread, already scooped thunk. It’s cheaper because you don’t need to consider tricks whose evaluation time is so short that it completely coincides between two pauses.

  • Using -feager-blackholing-flag instead creates black holes as soon as the thunk evaluation starts, and the User Guide recommends this if you do a lot of parallelism. However, since locking on every thunk will be too expensive, these black holes are cheaper "impatient" that do not synchronize with other threads (although other threads can still see them if there is no race condition). Only when streams stop, do they turn into “true” black holes.

  • The third case, which the blog was especially worried about, is used for special functions, such as unsafePerformIO , for which it is harmful to evaluate thunk more than once. In this case, the stream uses a “true” black hole with a real lock, but creates it immediately by inserting an artificial thread into the pause before a real assessment.

+40
Aug 12 '15 at 19:48
source share

To summarize the article related to the comments: thunks in the GHC are not strictly atomic between multiple threads: in race condition for multiple threads you can evaluate the same thunk, duplicating the work. But this does not matter much in practice, because:

  • Guaranteed cleanliness means that it never matters whether thunk is evaluated twice in terms of program semantics. unsafePerformIO is a case where this may be a problem, but it turns out that unsafePerformIO should avoid re-running the same I / O action.
  • Since all values ​​are thunks, most thunks turn out to be quite small, so sometimes duplicating work to force it does not really matter in terms of program speed. You can imagine that it is expensive to duplicate, for example, last [1,2..10000000] , because the whole calculation is expensive. But, of course, indeed the farthest storm just solves a different tone, something like:

     case [1,2..10000000] of [x] -> x (_:xs) -> last xs [] -> error "empty list" 

    and if two threads duplicate the work in order to transfer the call from last to use case , it is pretty cheap in the great scheme of things.

+21
Aug 12 '15 at 19:40
source share

Yes, sometimes the same thunk can be evaluated by multiple threads. GHC runtime tries to minimize the likelihood of duplication of work, so this is rare in practice. Please see "Haskell on a shared multiprocessor" for low level information, mainly in the "Free Access Lock" section. (I would recommend a document for every professional haskell btw developer.)

+13
Aug 12 '15 at 19:43
source share



All Articles