C ++ ifstream and umlauts

I have a problem with umlauts (letters ä, ü, ö, ...) and ifstream in C ++.

I use curl to load an html page and ifstream to read the downloaded file line by line and parse some data from it. This is good as long as I don't have a string, as one of the following:

te="Olimpija Laibach - Tromsö"; te="Burghausen - Münster"; 

My code parses these lines and outputs them as follows:

 Olimpija Laibach vs. Troms? Burghausen vs. M?nster 

Things like umlauts output directly from code:

 cout << "öäü" << endl; // This works fine 

My code looks something like this:

 ifstream fin("file"); while(!(fin.eof())) { getline(fin, line, '\n'); int pos = line.find("te="); if(pos >= 0) { pos = line.find(" - "); string team1 = line.substr(4,pos-4); string team2 = line.substr(pos+3, line.length()-pos-6); cout << team1 << " vs. " << team2 << endl; } } 

Edit: The strange thing is that the same code (the only thing changed is the source and delimiters) works for a different text input file (same procedure: loading with curl, reading with ifstream), Parse and output a line like following, no problem:

 <span id="...">Fernwärme Vienna</span> 
+3
source share
1 answer

Which locale is built into fin ? In the code that you show, it will be a global language, which, if you do not reset, is "C" .

If you are somewhere outside the Anglo-Saxon world, and the lines you show that you are one of the first things you do in main should be

 std::locale::global( std::locale( "" ) ); 

This sets the global locale (and therefore the default locale for any threads open later) in the locale used in the environment. (Formally, to an implementation defined by the source environment, but in practice, regardless of which user is using.) In the "C" locale, the encoding is almost always ASCII; ASCII does not recognize Umlauts, and according to the standard, illegal input encodings must be replaced with an implementation of a certain nature (IIRC - it has been a while since I re-read this section). In the conclusion, of course, you are not supposed to have any unknown characters, so the implementation does not check them and go through.

Since std::cin , etc. open before you have a chance to set a global locale, you will need to fill them with std::locale( "" ) in particular.

If this does not work, you may need to find a specific language to use.

+2
source

Source: https://habr.com/ru/post/1491386/


All Articles