Getting XML encoding type in java

I am parsing XML using DocumentBuilder in java 1.4.
XML has the first line as

 xml version="1.0" encoding="GBK" 

I want to get the XML encoding type and use it. How can I get "GBK"
Basically I will create another XML where I want to keep encoding="GBK" .
It is currently lost and installed by default UTF-8
There is a lot of XML with different coding, I need to read the coding source and the necessary things.

Please, help

+6
source share
3 answers

One way this works is how

 final XMLStreamReader xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader( new FileReader( testFile ) ); //running on MS Windows fileEncoding is "CP1251" String fileEncoding = xmlStreamReader.getEncoding(); //the XML declares UTF-8 so encodingFromXMLDeclaration is "UTF-8" String encodingFromXMLDeclaration = xmlStreamReader.getCharacterEncodingScheme(); 
+4
source

This works for various encodings, taking into account both the specification and the XML declaration. By default, UTF-8 , if not applicable:

 String encoding; FileReader reader = null; XMLStreamReader xmlStreamReader = null; try { InputSource is = new InputSource(file.toURI().toASCIIString()); XMLInputSource xis = new XMLInputSource(is.getPublicId(), is.getSystemId(), null); xis.setByteStream(is.getByteStream()); PropertyManager pm = new PropertyManager(PropertyManager.CONTEXT_READER); for (Field field : PropertyManager.class.getDeclaredFields()) { if (field.getName().equals("supportedProps")) { field.setAccessible(true); ((HashMap<String, Object>) field.get(pm)).put( Constants.XERCES_PROPERTY_PREFIX + Constants.ERROR_REPORTER_PROPERTY, new XMLErrorReporter()); break; } } encoding = new XMLEntityManager(pm).setupCurrentEntity("[xml]".intern(), xis, false, true); if (encoding != "UTF-8") { return encoding; } // From @matthias-heinrich's answer: reader = new FileReader(file); xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(reader); encoding = xmlStreamReader.getCharacterEncodingScheme(); if (encoding == null) { encoding = "UTF-8"; } } catch (RuntimeException e) { throw e; } catch (Exception e) { throw new UndeclaredThrowableException(e); } finally { if (xmlStreamReader != null) { try { xmlStreamReader.close(); } catch (XMLStreamException e) { } } if (reader != null) { try { reader.close(); } catch (IOException e) { } } } return encoding; 

Tested in Java 6 with:

  • UTF-8 XML specification file with XML declaration βœ“
  • UTF-8 XML file without specification, with XML declaration βœ“
  • UTF-8 XML specification file without XML declaration βœ“
  • UTF-8 XML file without specification, without XML declaration βœ“
  • ISO-8859-1 XML file (without specification) with XML declaration βœ“
  • UTF-16LE XML specification file without XML declaration βœ“
  • UTF-16BE XML specification file without XML declaration βœ“

Standing on the shoulders of these giants:

 import java.io.*; import java.lang.reflect.*; import java.util.*; import javax.xml.stream.*; import org.xml.sax.*; import com.sun.org.apache.xerces.internal.impl.*; import com.sun.org.apache.xerces.internal.xni.parser.*; 
+1
source

Using javax.xml.stream.XMLStreamReader to parse your file, you can call getEncoding() .

0
source

Source: https://habr.com/ru/post/909264/


All Articles