Using Poco XMLWriter with UTF8 Strings in C ++

I have a problem trying to deploy my head using UTF8 with Poco::XML::XMLWriter. In the following code example, everything works fine when the input contains ASCII characters. However, sometimes the line in wordmapIt->firstcontains a value other than ASCII, such as the -105 character that appears in the middle of the line. When this happens, the xml stream seems to end with -105 char, although there are many other words after that. I want to save any line, so just removing the char is not the right answer - theres should be some kind of encoding that I can apply (I think), but what?

I am clearly missing something conceptually, but for my life I cannot find the right way to do this.

Poco::XML::XMLString EDocument::makeXMLString()
{
    std::stringstream xmlstream;
    Poco::UTF8Encoding utf8encoding;
    Poco::XML::XMLWriter writer(xmlstream, 0, "UTF-8", &utf8encoding);

    writer.startDocument();
    std::map<std::string, std::string>::iterator wordmapIt;

    for ( wordmapIt = nodeinfo->wordmap.begin(); wordmapIt != nodeinfo->wordmap.end(); wordmapIt++ )
    {
        writer.startElement("", "", "word");
        writer.characters(Poco::XML::toXMLString(wordmapIt->first));
        writer.endElement("", "", "word");
        }
        writer.endDocument();
    return xmlstream.str();
    }

Edit: The solution is based on the answer below.

Poco::XML::XMLString EDocument::makeXMLString()
{
    std::stringstream xmlstream;
    Poco::UTF8Encoding utf8encoding;
    Poco::XML::XMLWriter writer(xmlstream, 0, "UTF-8", &utf8encoding);

    Poco::Windows1252Encoding windows1252encoding;
    Poco::UTF8Encoding utf8encoding;
    Poco::TextConverter textconverter(windows1252encoding, utf8encoding);

    writer.startDocument();
    std::map<std::string, std::string>::iterator wordmapIt;

    for ( wordmapIt = nodeinfo->wordmap.begin(); wordmapIt != nodeinfo->wordmap.end(); wordmapIt++ )
        {
        std::string strword; 
        textconverter.convert(wordmapIt->first, strword);
        writer.startElement("", "", "word");
        writer.characters(strword);
        writer.endElement("", "", "word");
        }
    writer.endDocument();
    return xmlstream.str();
}
+3
1

, Windows 1252. " -105", -, 0x97, Unicode U + 2014 Em Dash () cp1252.

Poco, , cp1252 UTF-8, TextConverter Windows1252Encoding UTF8Encoding.

, " ANSI" ( ), 1252 , , , .

+1

Source: https://habr.com/ru/post/1771290/


All Articles