Haskell framework for parallelizing non-threaded C ++ lib

I have a closed source non-threaded C ++ library that provides one function f :: ByteString -> ByteString. The runtime of this function can be from one second to a couple of hours.

I am looking for a way to distribute computing across multiple cores / servers (SIMD).

In short, I am looking for a framework that provides a function

g :: Strategy b -> (a -> b) -> a -> b 

to raise a function that can only be called sequentially into a function that behaves like any other pure function in Haskell.

For example, I want to be able to write:

  parMap rwhnf f args -- will not work 

Since f calls the C function in a thread safe library via FFI, this will not work. Therefore, I could replace the function f with the function g, which contains the job queue, and sends the tasks to N separate processes. Processes can be executed locally or distributed:

  parMap rwhnf g args -- should works 

The potential framework that I have already reviewed

  • MPI : client (Haskell) <- MPI β†’ broker (C ++) <- MPI β†’ working (C ++) ↔ Lib (C ++)

  • ZeroMQ : Client (Haskell) <- ZeroMQ β†’ Broker (C ++) <- ZeroMQ β†’ Work (C ++) ↔ Lib (C ++)

  • Cloud Haskell : client (Haskell) <- CloudHaskell β†’ working (Haskell) <- FFI β†’ Lib (C ++)

  • Gearman

  • Erlang : Client (Haskell) <- Erlang β†’ Broker (Erlang) <- Erlang C Node β†’ Worker (C ++)

Each approach has its advantages and disadvantages.

  • MPI will create a lot of security issues and be a pretty tough decision.

  • ZeroMQ is a nice solution, but for this I have to write a broker / balancer, etc. all separately (especially to get the right to reliability is not trivial).

  • CloudHaskell doesn't look very mature.

  • Gearman does not start on Windows and has no Haskell bindings. I know about java-gearman-service, but it is much less mature than the C daemon and has some other problems (for example, no document is turned off if there is no incoming task flow for some time, etc.).

  • Similar to 1 and requires the use of a third language.

Thanks!

+6
source share
1 answer

Since the library you are using is not thread safe, you would like the solution to be based on using processes as an abstraction for parallelism. The example you would like to see with Par monad uses a parallelism model based on sparks or tasks, where many sparks can live in the same thread. Obviously, this is not what you are looking for.

Do not be afraid!

Haskell has several paradigms that work like this, and you mentioned one of them in your Cloud Haskell post. Although Cloud Haskell is not β€œmature,” it can solve your problems, but it can be a little heavyweight for your needs. If you just need to use many local cores using a parallel process-level abstraction, look at the Eden library:

http://www.mathematik.uni-marburg.de/~eden/

With Eden, you can absolutely express what you need. Here is a very simple example according to your version based on Par Monad:

 f $# args 

Or in the case of many arguments, you can simply pull out a map of the old:

 map f $# args 

For more information on $ # syntax and tutorials about Eden, see:

http://www.mathematik.uni-marburg.de/~eden/paper/edenCEFP.pdf

YMMV, since most of the more mature parallel paradigms in Haskell suggest that you have a level of thread safety or that use can do parallel work in its purest form.

Good luck and happy hacking!

+1
source

Source: https://habr.com/ru/post/915577/


All Articles