Enthusiam for studying penetration is excellent; Do not misunderstand me. Enthusiasm for using a large number of streams, by contrast, is a symptom of what I call the ādisease of happinessā.
Developers who have just learned about the strength of threads start asking questions such as "how many threads can you create in one program?" It is more like an English major asking, "how many words can I use in a sentence?" Typical advice for writers is to keep your sentences short and precise, rather than trying to squeeze as many words and ideas into one sentence as possible. Themes are the same; the right question is not "how much can I get off creation?" but rather, "how can I write this program so that the number of threads is the minimum necessary to complete the task?"
Themes solve many problems, itās true, but they also present huge problems:
- Performance analysis of multithreaded programs is often extremely complex and deeply controversial. I have seen real world examples in highly multi-threaded programs in which executing a function is faster without slowing down any other function or using more memory reduces the overall system bandwidth. What for? Because streams often look like streets in the city center. Imagine every street and magic to be shorter, without having to re-select a traffic light. Will traffic jams be better or worse? Writing faster functions in multi-threaded programs leads to processor overloads faster.
You want the flows to be like interstate highways: no traffic lights that are very parallel, intersecting in a small amount of very clearly defined, carefully designed points. It is very difficult to do. Most multi-threaded programs are more like dense city cores with freeze frames around the world.
- Writing your own custom flow control is insanely complicated. The reason is that when you write a regular single-threaded program in a well-designed program, the amount of "global state" you should talk about is usually small. Ideally, you write objects that have well-defined boundaries and that don't care about the control flow that calls their members. You want to call an object in a loop, or a switch or something else, you go straight ahead.
Multithreaded programs with user-controlled flow control require a global understanding of everything that the thread will do, which may affect data that is visible from another thread. To a large extent, you should have the whole program in your head and understand all the possible ways of interaction between the two streams in order to get the right solution and prevent mutual blockages or data corruption. This is a large payment fee and is highly error prone.
In essence, threads make your methods false. Let me give you an example. Suppose you have:
if (! queue.IsEmpty) queue.RemoveWorkItem (). Execute ();
Is this code correct? If it's single threaded, maybe. If it is multithreaded, then what stops the other thread from deleting the last remaining item after making an IsEmpty call? Nothing, that's what. This code, which looks just fine, is a bomb waiting to be released in a multi-threaded program. Basically this code is actually:
if (queue.WasNotEmptyAtSomePointInThePast) ...
which is obviously pretty useless.
So, suppose you decide to fix the problem by blocking the queue. Is it correct?
lock(queue) {if (!queue.IsEmpty) queue.RemoveWorkItem().Execute(); }
This is also not correct. Suppose that execution causes code to run that waits for a resource that is currently blocked by another thread, but this thread expects blocking for the queue - what happens? Both threads wait forever. Including a lock in a piece of code requires that you know everything that the code could do with any shared resource, so that you can decide if there will be any deadlocks. Again, this is an extremely heavy burden to put on someone writing something that should be very simple code. (The right thing that can be done here is probably to extract the work item into the lock, and then execute it outside the lock. But ... what if the items are in the queue because they must be executed in a specific order The code is incorrect because other threads can then complete later tasks.)
- Everything is getting worse. The C # language specification ensures that a single-threaded program will have observable behavior exactly the same as specified in the program. That is, if you have something like "if (M (ref x)) b = 10; then you know that the generated code will behave as if x is accessing M before b is written Now, the compiler, jitter and processor can freely optimize this. If one of them can determine that M will be true, and if we know that in this stream the value of b is not read after calling M, then b can be assigned before how access to x will be obtained. All that is guaranteed is that the single-threaded program works the way it was written.
Multithreaded programs do not provide this guarantee. If you examine b and x in another thread while this one is running, you can see the b change before x is available if this optimization is done. Reading and writing can logically move forward and backward in time with respect to each other in single-threaded programs, and these movements can be observed in multi-threaded programs.
This means that in order to write multi-threaded programs, where there is a dependence in logic on things that are observed, it happens in the same order as the code actually written, you must have a detailed understanding of the "memory" model of the language and runtime. You need to know exactly what guarantees are made regarding how access can move in time, and you cannot just test your x86 box and hope for the best; x86 chips have quite conservative optimization compared to some other chips there.
This is just a brief overview of several of the problems you encountered while writing your multi-threaded logic. There are many more. So, some tips:
- Learn about streaming.
- Do not try to write your own flow control in production code.
- Use higher-level libraries written by experts to solve problems with threads. If you have a bunch of work that needs to be done in the background, and you want to process it for worker threads, use the thread pool, rather than create your own thread creation logic. If you have a problem that can be solved by several processors at the same time, use a parallel task library. If you want to lazily initialize a resource, use the lazy initialization class, rather than trying to write the lock code yourself.
- Avoid sharing.
- If you cannot avoid sharing, exchange an immutable state.
- If you need to share volatile state, prefer to use locks for locking methods.