What type of queue to use for parallel data processing - C # -.NET 4

Scenario: Data is received and written to the database with time stamps. I need to process the raw data in the order that is obtained based on the timestamp, and write it back to the database, another table, again maintaining the order based on the timestamp.

I came up with the following project: I created two queues, one for storing raw data from a database, the other for storing processed data before it is written back to the database. I have two threads, one reading in the Initial queue and the other reading from the result queue. In the interval, I create several threads for processing data from the initial queue and write them to the result queue.

I experimented with SortedList (manual blocking) and BlockingCollection. I used two approaches for parallel processing: Parallel.For (ForEach) and TaskFactory.Task.StartNew.

Each data unit can process a variable amount of time for processing based on several factors. One thread can still process the first data point, while other threads execute with three or four data points each, ruining the timestamp order.

I recently learned about OrderingPartitioner, and I decided that this would solve the problem, but in the following MSDN example, I can see that it also does not sort the base collection. Maybe I need to implement a custom separator to order my collection of complex data types? or maybe the best way to approach the problem?

Any suggestions and / or links to articles discussing such a problem are commendable.

+6
source share
3 answers

Personally, I would at least try to start by using a BlockingCollection<T> for input and a ConcurrentQueue<T> instance for the results.

I would use Parallel Linq to process the results. To maintain order during processing, you can use AsOrdered () in the PLINQ statement.

+5
source

Do you consider PLINQ and AsOrdered ()? This can be useful for what you are trying to achieve. http://msdn.microsoft.com/en-us/library/dd460719.aspx

+2
source

You may have considered these things, but ...

Why not just pass the timestamp to the database, and then either let the database place an order or fix the order in the database after all the processing threads have been returned? Should sql statements be executed sequentially?

PLINQ is great, but I would try to avoid thread synchronization requirements and just transfer more order data to the database if you can.

0
source

Source: https://habr.com/ru/post/885925/


All Articles