ForkIO streams and OS threads

If I create a thread using forkIO, I need to provide a function to start and return the identifier (threadID). Then I can communicate with this animal through, for example, workloads, MVAR, etc. However, as I understand it, the created thread is very limited and can only work in the form of SIMD, where the function provided to create the threads is an instruction. I cannot change the function that I provided when starting the stream. I understand that these user threads are ultimately OS mapped to OS threads.

I would like to know how Haskell threads and OS threads interact. Why are Haskell threads that do completely different things mapped to the same OS thread? Why is there no need to initiate an OS thread with a fixed instruction (as necessary in forkIO)? How does the scheduler (?) Recognize user flows in an application that can be distributed? In other words, why are OS threads so flexible?

Finally, is there a way to dump a bunch of selected thread from application?

+4
source share
2 answers

First, let me turn to one quick fallacy:

I understand that these user threads are ultimately OS mapped to OS threads.

In fact, the Haskell runtime is responsible for deciding which Haskell thread executes a particular OS thread from its pool.

Now questions, one at a time.

Why are Haskell threads that do completely different things mapped to the same OS thread?

Ignoring FFI at the moment, all OS threads actually start the Haskell workspace, which keeps track of the list of Haskell ready threads. The runtime chooses the Haskell thread to run and passes into the code, executing until the thread returns control at runtime. At this point, the runtime has the ability to continue execution of the same thread or select another.

In short: many Haskell threads can be mapped to a single OS thread, since in fact this OS thread does only one thing: launch the Haskell runtime.

Why is there no need to initiate an OS thread with a fixed instruction (as necessary in forkIO)?

I do not understand this question (and I think this stems from the second fallacy). You start OS threads with a fixed instruction in exactly the same sense that you start Haskell threads with a fixed instruction: for each thing, you just give a piece of code to execute and what it does.

How does the scheduler (?) Recognize user threads in an application that can be distributed?

Distributed is a dangerous word: this usually refers to extending code on multiple machines (apparently this is not what you had in mind here). As for the Haskell runtime, you can tell when there are multiple threads, well, it's simple: you say this when you call forkIO .

In other words, why are OS threads so flexible?

It's not clear to me that OS threads are more flexible than Haskell threads, so this question is a bit strange.

Finally, is there a way to dump a bunch of selected thread from application?

I really don't know any tools for burying the Haskell heap at all, in multi-threaded applications or otherwise. If you want, you can reset the view of the part of the heap that is accessible from a specific object using a package such as vacuum . I used vacuum-cairo to visualize these landfills with great success in the past.

For more information, you can familiarize yourself with the middle two sections: “Agreements” and “Foreign Imports”, from my entry into gtk2hs multi-threaded programming and, possibly, also the bit of the “Runtime Stream” section.

+10
source

Instead of directly answering your question, I will try to present a conceptual model of how multi-threaded Haskell programs are implemented. I will ignore many details and difficulties.

Operating systems implement proactive multithreading using hardware interrupts to allow multiple "threads" of computations to work logically on the same core at the same time.

Threads provided by operating systems are heavy. They are well suited for certain types of multithreaded applications, and on systems such as Linux, they are basically the same tool that allows you to run multiple programs at once (the task they succeed).

But these threads carry a lot of weight for many applications in high-level languages ​​such as Haskell. In essence, the GHC runtime works like a mini-OS, implementing its own "threads" on top of the OS threads, just as the OS implements threads on top of the cores.

It is conceptually easy to imagine that a language such as Haskell would be implemented in this way. A Haskell evaluation consists of "thunks forcing", where thunk is a unit of calculation that can 1. depend on another value (thunk) and / or 2. create new thunks.

Thus, it is possible to imagine several flows, each of which evaluates the impact at the same time. One could build a queue of tricks to evaluate. Each thread will pop up at the top of the queue and evaluate it until it is complete, and then select a new piece from the queue. The operation par and its ilk can “distort” new computations by adding thunk to this queue.

Extending this model to I / O is not particularly difficult to imagine. Instead of just forcing each of them to force a clean piece, we imagine that the Haskell unit of calculation is somewhat more complicated. Psuedo Haskell for such a run time:

 type Spark = (ThreadId,Action) data Action = Compute Thunk | Perform IOAction 

Note: this is only for conceptual understanding, do not think that things are implemented in this way.

When we launch Spark, we look for exceptions thrown at this thread identifier. Assuming we don’t have them, the execution is either to force thunk or to execute an IO action.

Obviously, my explanation here was very wavy and ignored some complexity. Moreover, the GHC team has written excellent articles, such as "Runtime Support for Multicore Haskell" by Marlowe et al. You can also look at a tutorial on operating systems, as they often go to some extent on how to create a scheduler.

+8
source

Source: https://habr.com/ru/post/1433872/


All Articles