Shift-JIS decoding does not work using wifstrem in Visual C ++ 2013

I am trying to read a text file encoded in Shift-JIS (cp 932) using std :: wifstream and std :: getline. The following code works in VS2010, but does not work in VS2013:

std::wifstream in; in.open("data932.txt"); const std::locale locale(".932"); in.imbue(locale); std::wstring line1, line2; std::getline(in, line1); std::getline(in, line2); const bool good = in.good(); 

The file contains several lines, where the first line contains only ASCII characters, and the second contains a Japanese script. So when this snippet works, line1 should contain an ASCII string, line2 Japanese script and good should be true.

When compiling in VS2010, the result will be as expected. But when compiled in VS2013, line1 contains an ASCII string, but line2 empty and good is false.

I was debugging in CRT (as the source is provided by Visual Studio) and found that the internal function named _Mbrtowc (in the xmbtowc.c file) was changed between the two versions and the way they use the detection of the high byte of a double byte character was changed, and one in VS 2013 did not detect the leading byte, so it was not possible to decode the byte stream.

Further debugging revealed the point where the _Cvtvec object _Isleadbyte array is initialized (in the _Getcvt() function, in the xwctomb.c file), and this initialization leads to an incorrect result. It seems that it always uses code page 1252, which is the default code page on my system, not 932, which is configured for the stream being used. However, I could not decide if this was by design, and I skipped some required steps to get a good result, or is this really a CRT bug for VS2013.

Unfortunately, I do not have VS2012, so I could not test this version.

Any ideas on this topic are welcome!

+5
source share
1 answer

I found a workaround: if I explicitly change the global MBC code page to create a locale, the language initializes correctly and the lines are read and decoded as expected.

 const int oldMbcp = _getmbcp(); _setmbcp(932); const std::locale locale("Japanese_Japan.932"); _setmbcp(oldMbcp); 
+2
source

Source: https://habr.com/ru/post/1205729/


All Articles