It is very difficult to say what happens there, strictly based on the code you provided.
TPL uses thread pool threads. A thread pool starts with about 10 threads. If you need more threads, then the thread pool will create new ones about once per second until a new thread is needed. If your loop has led to more than 10 parallel operations, it would have to spend time creating a new thread. Correction : the number of threads required for a parallel loop takes away from the available threads in the thread pool. The thread pool tries to keep the minimum number of threads available in this pool, if it notices that the threads are taking too long, it will create new ones to compensate - which requires resources. Many parts of the framework use a thread pool, so there are all sorts of features that can be associated with a thread pool. Running a thread is quite expensive.
Another, possibly, is that if the number of iterations was more than the number of available processors, then a lot of context switching appeared. Context switching is expensive and affects the load on the processors, as well as how quickly the OS can switch between threads.
If you provide more detailed information, such as input, I can provide more detailed information in the answer.
source share