É incorrectly disassembled

My application will read xml from urlconnection. The xml encoding is ISO-8859-1, it contains the é character. I use xerces saxparser to parse the resulting xml content. However, é cannot be parsed correctly when starting an application running lunix OS. Everything works fine on Windows. Could you guys give me some hints? Many thanks

+3
source share
5 answers

This is probably the case of a file marked as "ISO-8859-1" when it is actually in a different encoding.

Often this happens with "ISO-8859-1" and "Windows-2152": they are used as if they were interchangeable, but this is not so. (In the comments on this answer, it was found that both encodings are consistent with the character code for "é", so Windows-1252 is probably not.)

You can use the Hex editor to find out the exact char code "é" in your file. You can take this value as a hint about what encoding the file is in. If you have control over how the file is created, look at the responsible code / method, it is also recommended.

+2
source

I am sure this is related to file.encoding. Try running with -Dfile.encoding = iso-8859-1 as the VM parameter in linux.

, , , (- ).

+1

, , xml , Tomalak, , .

Internet Explorer. , ​​:

. ...

:

. ...

. Notepad ++, , . , xml , .

, Java. , Java UTF-16 / , . Java (Windows-1521 Windows UTF-8 Linux). "" , 8- (, Windows-1252 ↔ ISO-8859-1). - ( Windows-1252 UTF-8, ).

:

// Parse the input
SAXParser saxParser = factory.newSAXParser();
InputStream is = new ByteArrayInputStream(stringToParse.getBytes());
saxParser.parse( is, handler );

stringToParse.getBytes() , Windows-1252 Windows. XML ISO-8859-1 , . XML , String, SAX xml.

+1

XML , UTF-8.

, XML, InputSource:

InputSource inputSource = new InputSource(xmlInputStream);
inputSource.setEncoding("ISO-8859-1");
0

. . ( , , ).

Thanks for helping all of you guys.

0
source

Source: https://habr.com/ru/post/1699629/


All Articles