This works for various encodings, taking into account both the specification and the XML declaration. By default, UTF-8
, if not applicable:
String encoding; FileReader reader = null; XMLStreamReader xmlStreamReader = null; try { InputSource is = new InputSource(file.toURI().toASCIIString()); XMLInputSource xis = new XMLInputSource(is.getPublicId(), is.getSystemId(), null); xis.setByteStream(is.getByteStream()); PropertyManager pm = new PropertyManager(PropertyManager.CONTEXT_READER); for (Field field : PropertyManager.class.getDeclaredFields()) { if (field.getName().equals("supportedProps")) { field.setAccessible(true); ((HashMap<String, Object>) field.get(pm)).put( Constants.XERCES_PROPERTY_PREFIX + Constants.ERROR_REPORTER_PROPERTY, new XMLErrorReporter()); break; } } encoding = new XMLEntityManager(pm).setupCurrentEntity("[xml]".intern(), xis, false, true); if (encoding != "UTF-8") { return encoding; }
Tested in Java 6 with:
UTF-8
XML specification file with XML declaration βUTF-8
XML file without specification, with XML declaration βUTF-8
XML specification file without XML declaration βUTF-8
XML file without specification, without XML declaration βISO-8859-1
XML file (without specification) with XML declaration βUTF-16LE
XML specification file without XML declaration βUTF-16BE
XML specification file without XML declaration β
Standing on the shoulders of these giants:
import java.io.*; import java.lang.reflect.*; import java.util.*; import javax.xml.stream.*; import org.xml.sax.*; import com.sun.org.apache.xerces.internal.impl.*; import com.sun.org.apache.xerces.internal.xni.parser.*;
source share