Is there a way to atomically read a line from a C ++ file

I am currently working on a project where I have a large text file (15+ GB) and I am trying to run a function on every line of the file. To speed up the task, I create 4 threads and try to get them to read the file at the same time. This is similar to what I have:

#include <stdio.h> #include <string> #include <iostream> #include <stdlib.h> #include <thread> #include <fstream> void simpleFunction(*wordlist){ string word; getline(*wordlist, word); cout << word << endl; } int main(){ int max_concurrant_threads = 4; ifstream wordlist("filename.txt"); thread all_threads[max_concurrant_threads]; for(int i = 0; i < max_concurrant_threads; i++){ all_threads[i] = thread(simpleFunction,&wordlist); } for (int i = 0; i < max_concurrant_threads; ++i) { all_threads[i].join(); } return 0; } 

The getline function (along with "* wordlist β†’ word") seems to increment the pointer and read the value in 2 steps, since I will regularly receive:

 Item1 Item2 Item3 Item2 

back.

So, I was wondering if there is a way to atomically read a file line? Loading it into an array will not work at first because the file is too large, and I would prefer not to load the file into chunks at a time.

I could not find anything about fstream and getline atomicity sadly. If there is an atomic version of readline or even an easy way to use locks to achieve what I want, I’m all ears.

Thanks in advance!

+6
source share
1 answer

The correct way to do this is to lock the file, which would prevent all other processes from using it. See Wikipedia: file locking . This is probably too slow for you because you only read one line at a time. But if you read, for example, 1000 or 10000 lines during each function call, this may be the best way to implement it.

If there are no other processes accessing the file, and it is enough that other threads do not access it, you can use the mutex that you block when accessing the file.

 void simpleFunction(*wordlist){ static std::mutex io_mutex; string word; { std::lock_guard<std::mutex> lock(io_mutex); getline(*wordlist, word); } cout << word << endl; } 

Another way to implement your program might be to create one stream that reads lines in memory all the time, while other threads will request separate lines from the class that stores them. You will need something like this:

 class FileReader { public: // This runs in its own thread void readingLoop() { // read lines to storage, unless there are too many lines already } // This is called by other threads std::string getline() { std::lock_guard<std::mutex> lock(storageMutex); // return line from storage, and delete it } private: std::mutex storageMutex; std::deque<std::string> storage; }; 
+4
source

Source: https://habr.com/ru/post/1012785/


All Articles