Reading UTF-8 XML with MSXML 4.0

I have a problem with classc ASP / VBScript trying to read a UTF-8 encoded XML file with MSXML. The file is encoded correctly, I see this with all the other tools.

Built XML example:

<?xml version="1.0" encoding="UTF-8"?>
<itshop>
    <Product Name="Backup gewünscht" />
</itshop>

If I try to do this in ASP ...

Set fso = Server.CreateObject("Scripting.FileSystemObject")
Set ts = fso.OpenTextFile("input.xml", FOR_READING)
XML = ts.ReadAll
ts.Close
Set ts = nothing
Set fso = Nothing

Set myXML = Server.CreateObject("Msxml2.DOMDocument.4.0")
myXML.loadXML(XML)
Set DocElement = myXML.documentElement
Set ProductNodes = DocElement.selectNodes("//Product")
Response.Write ProductNodes(0).getAttribute("Name")
' ...

... and the name contains special characters (specific German umlauts), the bytes of the "double-byte code" umlauts are transcoded, so I get two absolutely crappy meaningless characters. What should be "ü" becomes "¼" - this is four bytes on my output, not two (correct UTF-8) or one (ISO-8859 - #).

What am I doing wrong? Why does MSXML think that the input is ISO-8859- # so that it tries to convert it to UTF-8?

+3
1
Set ts = fso.OpenTextFile("input.xml", FOR_READING, False, True)

- Unicode.

OpenTextFile() :

object.OpenTextFile(filename[, iomode[, create[, format]]])

""

. Tristate, . , ASCII.

Tristate :

TristateUseDefault  -2   Opens the file using the system default.
TristateTrue        -1   Opens the file as Unicode.
TristateFalse        0   Opens the file as ASCII.

-1 True.

, :

Set myXML = Server.CreateObject("Msxml2.DOMDocument.4.0")
myXML.load("input.xml")

TextStream , MSXML .

TextStream . "", Unicode. load() MSXML .

+5

Source: https://habr.com/ru/post/1710093/


All Articles