Why my DOM analyzer cannot read UTF-8

Question

Why my DOM analyzer cannot read UTF-8

I have a problem with the fact that my DOM parser cannot load the file when there are UTF-8 characters in the XML file Now, I know that I have to give it instructions on reading utf-8, but I do not know how to put it in my code here it is:

File xmlFile = new File(fileName); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Document doc = dBuilder.parse(xmlFile); doc.getDocumentElement().normalize();

I know that there is a setencoding () method, but I don’t know where to put it in my code ...

+4

java dom parsing

ivanz May 06 '13 at 13:46

source share

3 answers

Try using Reader and provide the encoding as a parameter:

 InputStream inputStream = new FileInputStream(fileName); documentBuilder.parse(new InputSource(new InputStreamReader(inputStream, "UTF-8")));

+5

Eugene May 06 '13 at 14:35

source share

I used what Eugene did there, and changed it a bit.

 DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); FileInputStream in = new FileInputStream(new File("XML.xml")); Document doc = dBuilder.parse(in, "UTF-8");

although it will be read as UTF-8 , if you type in the eclipse console, it will not show any "UTF-8" characters unless the java file is saved as "UTF-8" or at least that happened to me

-1

john-salib Jul 18 '14 at 0:13

source share

Rajesh Mbm · Accepted Answer · 2014-10-09T13:55:23+0000

Try it. Worked for me

  InputStream inputStream= new FileInputStream(completeFileName); Reader reader = new InputStreamReader(inputStream,"UTF-8"); InputSource is = new InputSource(reader); is.setEncoding("UTF-8"); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Document doc = dBuilder.parse(is);

Why my DOM analyzer cannot read UTF-8

More articles: