Using the seekg () function in text mode

While trying to read in a plain text file with ANSI encoding in text mode (Windows), I came across some strange behavior using seekg () and tellg () ; Each time I tried to use tellg () , saved its value (like pos_type), and then looked for it later, I was always further in the stream than where I left off.

In the end I did a sanity check; even if I just do it ...

int main() { std::ifstream dataFile("myfile.txt", std::ifstream::in); if (dataFile.is_open() && !dataFile.fail()) { while (dataFile.good()) { std::string line; dataFile.seekg(dataFile.tellg()); std::getline(dataFile, line); } } } 

... then, in the end, further into the file, the lines will be half. Why is this exactly happening?

+5
source share
2 answers

This problem is caused by libstdc ++, using the difference between the current remaining buffer with lseek64 to determine the current offset.

The buffer is set using the read return value, which for the text mode file in windows returns the number of bytes that were buffered after converting the final string (i.e., converting 2 bytes \r\n to \n , windows also seem to add false new line at the end of the file).

lseek64 however (which with mingw causes _lseeki64 to be _lseeki64 ) returns the current position of the absolute file, and after subtracting the two values, you get an offset that is disabled by 1 for each remaining new line in the text file (+1 for an extra line break line).

The following code should display a problem, you can even use a file with one character and without a new line due to the extra line inserted by the windows.

 #include <iostream> #include <fstream> int main() { std::ifstream f("myfile.txt"); for (char c; f.get(c);) std::cout << f.tellg() << ' '; } 

For a file with a single character a I get the following output

 2 3 

Clearly disable by 1 for the first call to tellg . After the second call, the file position is correct, since the end was reached after taking into account the additional line.

In addition to opening the file in binary mode, you can work around the problem by disabling buffering.

 #include <iostream> #include <fstream> int main() { std::ifstream f; f.rdbuf()->pubsetbuf(nullptr, 0); f.open("myfile.txt"); for (char c; f.get(c);) std::cout << f.tellg() << ' '; } 

but this is far from ideal.

Hope mingw / mingw-w64 or gcc can fix this, but first we need to determine who will be responsible for fixing it. I believe that the main problem is with the MS lysek implementation, which should return the appropriate values ​​according to how the file was opened.

+3
source

Thanks for that, although this is a very old post. I found this problem more than a week. Here are some code examples on my site (menu versions 1 and 2). Version 1 uses the solution presented here if someone wants to see it.

:)

 void customerOrder::deleteOrder(char* argv[]){ std::fstream newinFile,newoutFile; newinFile.rdbuf()->pubsetbuf(nullptr, 0); newinFile.open(argv[1],std::ios_base::in); if(!(newinFile.is_open())){ throw "Could not open file to read customer order. "; } newoutFile.open("outfile.txt",std::ios_base::out); if(!(newoutFile.is_open())){ throw "Could not open file to write customer order. "; } newoutFile.seekp(0,std::ios::beg); std::string line; int skiplinesCount = 2; if(beginOffset != 0){ //write file from zero to beginoffset and from endoffset to eof If to delete is non-zero //or write file from zero to beginoffset if to delete is non-zero and last record newinFile.seekg (0,std::ios::beg); // if primarykey < largestkey , it a middle record customerOrder order; long tempOffset(0); int largestKey = order.largestKey(argv); if(primaryKey < largestKey) { //stops right before "current..." next record. while(tempOffset < beginOffset){ std::getline(newinFile,line); newoutFile << line << std::endl; tempOffset = newinFile.tellg(); } newinFile.seekg(endOffset); //skip two lines between records. for(int i=0; i<skiplinesCount;++i) { std::getline(newinFile,line); } while( std::getline(newinFile,line) ) { newoutFile << line << std::endl; } } else if (primaryKey == largestKey){ //its the last record. //write from zero to beginoffset. while((tempOffset < beginOffset) && (std::getline(newinFile,line)) ) { newoutFile << line << std::endl; tempOffset = newinFile.tellg(); } } else { throw "Error in delete key" } } else { //its the first record. //write file from endoffset to eof //works with endOffset - 4 (but why??) newinFile.seekg (endOffset); //skip two lines between records. for(int i=0; i<skiplinesCount;++i) { std::getline(newinFile,line); } while(std::getline(newinFile,line)) { newoutFile << line << std::endl; } } newoutFile.close(); newinFile.close(); 

}

beginOffset is the specific point in the file (the beginning of each record), and endOffset is the end of the record, calculated in another function using tellg (findFoodOrder). I did not add this as it can become very long, but you can find it on my site (below: link to menu version 1):

http://www.buildincode.com

0
source

Source: https://habr.com/ru/post/1207370/


All Articles