PLINQ vs Tasks vs Async vs Producer / Consumer queue? What to use?

I read C# 5.0 in nutshell , and after reading the author’s views, I am very confused about what I should accept. My requirement is that I say that I have a really long work ( computationally hard ) task, for example, to calculate the SHA1 (or some other) hash of millions of files or, indeed, some other thing, is computationally heavy and will probably take some time, what should be my approach to its development (in winforms if it matters, using VS 2012, C # 5.0), so that I can also report progress to the user .

The following scenario (s) ...

  • Create a Task (with the LongRunning parameter, which calculates the hashes and reports the progress to the user, either by implementing IProgess<T> or Progess<T> or allowing tasks to capture the SynchronizationContext context and publish it to the user interface.

  • Create an Async method like

      async CalculateHashesAsync() { // await here for tasks the calculate the hash await Task.Rung(() => CalculateHash(); // how do I report progress??? } 
  • Use TPL (or PLINQ) as

     void CalcuateHashes() { Parallel.For(0, allFiles.Count, file => calcHash(file) // how do I report progress here? } 
  • Use the lineup of producers / consumers.
    Do not know how?

The author in the book says ...

Running one long-running task in a unified thread will not cause trouble. This is when running several long tasks in parallel (especially those that block), that performance may suffer. In this case, as a rule, solutions are better than TaskCreationOptions.LongRunnging

  • If tasks are related to IO, TaskCompletionSource and asynchronous functions allow you to implement concurrency with callbacks instead of threads.
  • If tasks are limited to computing, the producer / consumer queue allows throttling concurrency for these tasks, avoiding starvation for other threads and the process.

About Producer/Consumer author says ...

The producer / consumer queue is a useful structure, like concurrent programming and general concurrency scenarios, as it gives you precise control over how many workflows are executed right away, which is useful not only for limiting CPU consumption, but also for other resources.

So, should I not use the task, which means that the first option is missing? The second option is the best option? Are there any other options? And if I follow the author’s advice and implement the manufacturer / consumer, how would I do (I don’t even have an idea how to start working with the manufacturer / consumer in my scenario , if this is the best approach!)

I would like to know if someone has ever come across such a scenario, how will they be implemented? If not, what efficiency will be effective and / or easy to develop / maintain (I know that the word performance is subjective, but let's just look at the most general case when it works and works well!)

+4
source share
1 answer

a really long-running (computationally hard) task, say, for example, to calculate the SHA1 (or some other) hash of millions of files

In this example, there are clearly both heavy processors (hashing) and input / output (file) components. This may not be a representative example, but in my experience, even a secure hash is much faster than reading data from disk.

If you just work with the processor, the best solution is either Parallel or PLINQ. If you just work with I / O binding, the best solution is to use async . If you have a more realistic and complex scenario (with both a processor and I / O operations), you should either connect your CPU and I / O components to the manufacturers / consumers queue, or use a more complete solution, such as TPL data stream .

TPL Dataflow works well with parallel ( MaxDegreeOfParallelism ) and async and has a built-in queue of producers / consumers between them each block.

One thing to consider when mixing a huge amount of I / O and processor operations is that different situations can lead to significantly different performance characteristics. To be safe, you'll want to throttle the data passing through your queues so that you don't run into memory usage problems. TPL Dataflow has built-in throttling support through BoundedCapacity .

+6
source

Source: https://habr.com/ru/post/1489737/


All Articles