Intensive file I / O and data processing in C #

I am writing an application that should process a large text file (separated by commas of several types of records - I have no power or desire to change the format of data storage). It reads the records (often all the records in the file are sequentially, but not always), then the data for each record is transferred for some processing.

Currently, this part of the application is single-threaded (read the record, process it, read the next record, etc.). I think it would be more efficient to read the records in the queue in one thread and process them in another thread in small blocks or as they appear.

I do not know how to start programming something like this, including the data structure that will be needed or how to correctly implement multithreading. Can anyone give any guidance or suggest other suggestions on how I can improve performance here?

+4
source share
3 answers

You can benefit if you can balance time processing records against time reading records; in this case, you could use the producer / consumer setting, for example, a synchronized queue , and a working (or several) deletion and processing process. I may also be tempted to explore parallel extensions; it’s easy to write the text of the IEnumerable<T> code of your reading code, after which Parallel.ForEach (or one of the other Parallel methods) should do everything you need; eg:

 static IEnumerable<Person> ReadPeople(string path) { using(var reader = File.OpenText(path)) { string line; while((line = reader.ReadLine()) != null) { string[] parts = line.Split(','); yield return new Person(parts[0], int.Parse(parts[1]); } } } 
+3
source

Take a look at this tutorial, it contains everything you need ... These are Microsoft tutorials, including code examples for a similar case, as you described. Your producer fills the lineup while consumer records are recorded.

Create, run, and interact between threads

Synchronization of two streams: producer and consumer

+1
source

You can also see asynchronous I / O. In this style, you will start the operation with a file from the main thread, then it will continue to work in the background and when it is completed, it will call the callback that you specified. In the meantime, you can continue to do other things (for example, process data). For example, you can run an asynchronous operation to read the next 1000 bytes, and then process the 1000 bytes that you already have, and then wait for the next kilobyte.

Unfortunately, programming asynchronous operations in C # is a little painful. There is a sample MSDN , but this is not at all nice. This can be well solved in F # using asynchronous workflows. I wrote an article that explains the problem and shows how to do a similar thing using Iterators #.

A more promising solution for C # is the Wintellect PowerThreading library, which supports a similar trick using C # iterators. There is a good introductory article on the MSDN Concurrency Cases of Jeffrey Richter.

+1
source

Source: https://habr.com/ru/post/1298933/


All Articles