I really wonder if 0x12 is indeed even in XML 1.1. See This Summary for Comparison with 1.1 Differences. In particular:
In addition, XML 1.1 allows you to have control characters in your documents using the nature of Recommendations. This concerns control characters # x1 - # x1F, most of which are prohibited in XML 1.0. This means that your document may now include a bell symbol, for example:, however, you still cannot these symbols appear directly in your documents; this violates the definition of the mime type used for XML (text / xml).
Xerces can parse XML 1.1, but it seems that the  instead of the true 0x12 character:
val s = "<?xml version='1.1'?><root>\u0012</root>" // causes An invalid XML character (Unicode: 0x12) //XML.loadXML(xml.Source.fromString(s), XML.parser) val u = "<?xml version='1.1'?><root></root>" val v = XML.loadXML(xml.Source.fromString(u), XML.parser) println(v) // works
As suggested by lavinio, you can filter out invalid characters. This does not take up too many lines in Scala:
val in = new InputStream { val in0 = new FileInputStream("invalid.xml") override def read():Int = in0.read match { case 0x12=> read() case x=> x} } val x = XML.load(in)
source share