In an attempt to speed up the processing of physical objects in C #, I decided to change the linear update algorithm to a parallel algorithm. I figured the best approach was to use ThreadPool, as it was built to complete the job queue.
When I first implemented the parallel algorithm, I queued up work for each physics object. Keep in mind that one task is performed quite quickly (updates forces, speed, position, checks for collisions with the old state of any surrounding objects to make it thread safe, etc.). I would wait for all jobs to be completed using one wait descriptor, with a blocked integer, which I decreased every time the physical object completed (when I reached zero, I then set the wait descriptor). Waiting was required as the next task I needed to do so that all objects were updated.
The first thing I noticed was insanity. When averaging, the thread pool seemed to go a little faster, but it had massive performance (about 10 ms for updating with random transitions up to 40-60 ms). I tried to talk about this using ANTS, but I could not understand why the spikes occurred.
My next approach was to still use ThreadPool, however, instead, I split all the objects into groups. At first, I started working with only 8 groups, as it was on all the cores of my computer. The performance was great. It far exceeded the single-threaded approach and did not have bursts (about 6 ms per update).
The only thing I was thinking about was that if one job were finished before the others, there would be an idle core. Therefore, I increased the number of jobs to about 20 and even to 500. As I expected, it dropped to 5 ms.
So my questions are:
- Why did spikes arise when I quickly made a few jobs?
- Any idea how ThreadPool is implemented that will help me understand how to best use it?
source share