First, let me turn to one quick fallacy:
I understand that these user threads are ultimately OS mapped to OS threads.
In fact, the Haskell runtime is responsible for deciding which Haskell thread executes a particular OS thread from its pool.
Now questions, one at a time.
Why are Haskell threads that do completely different things mapped to the same OS thread?
Ignoring FFI at the moment, all OS threads actually start the Haskell workspace, which keeps track of the list of Haskell ready threads. The runtime chooses the Haskell thread to run and passes into the code, executing until the thread returns control at runtime. At this point, the runtime has the ability to continue execution of the same thread or select another.
In short: many Haskell threads can be mapped to a single OS thread, since in fact this OS thread does only one thing: launch the Haskell runtime.
Why is there no need to initiate an OS thread with a fixed instruction (as necessary in forkIO)?
I do not understand this question (and I think this stems from the second fallacy). You start OS threads with a fixed instruction in exactly the same sense that you start Haskell threads with a fixed instruction: for each thing, you just give a piece of code to execute and what it does.
How does the scheduler (?) Recognize user threads in an application that can be distributed?
Distributed is a dangerous word: this usually refers to extending code on multiple machines (apparently this is not what you had in mind here). As for the Haskell runtime, you can tell when there are multiple threads, well, it's simple: you say this when you call forkIO .
In other words, why are OS threads so flexible?
It's not clear to me that OS threads are more flexible than Haskell threads, so this question is a bit strange.
Finally, is there a way to dump a bunch of selected thread from application?
I really don't know any tools for burying the Haskell heap at all, in multi-threaded applications or otherwise. If you want, you can reset the view of the part of the heap that is accessible from a specific object using a package such as vacuum . I used vacuum-cairo to visualize these landfills with great success in the past.
For more information, you can familiarize yourself with the middle two sections: “Agreements” and “Foreign Imports”, from my entry into gtk2hs multi-threaded programming and, possibly, also the bit of the “Runtime Stream” section.