XMLStreamReader: get character offset: XML from file

In XMLStreamReader→ Location there is a method called getCharacterOffset().

Unfortunately, Javadocs point out that this is an ambiguously named method: it can also return a byte offset (and this seems to be true in practice); useless, it seems to happen when reading from files (for example):

Javadoc claims:

Returns the byte or character offset to the input source. location points to. If the input source is a file or stream byte, then this is the byte offset to this stream, but if the source input is a multimedia medium , then the offset is the character offset. (highlighted by me)

I really need a character offset ; and I'm sure that I am assigned a byte offset .

The XML code (encoded by UTF-8) is contained in a (partially damaged 1G) file. [Therefore, it is necessary to use a lower-level API that does not complain about a lack of correctness until it really has a choice but to].

Question

What does Javadoc mean when it says: “... the input source is multimedia media ...”: how can I make it think of my source file as “character media” - so I get the exact (Character), not byte offset?

Extra blah blah:

[ , , - ( ), , , - - , : ( "head" / "tail", , Powershell - , -, [ UTF-8] UTF-16, ]

+3
2

Source.

XMLStreamReader , Source, .

A Stream byte, byte.

A Reader char, char.

StreamSource , " ".

, -

final Source source = new StreamSource(new InputStreamReader(new FileInputStream(new File("my.xml")), "UTF-8"));
final XMLStreamReader xmlReader = XMLInputFactory.newFactory().createXMLStreamReader(source);
+3

XMLInputFactory.createXMLStreamReader(java.io.InputStream)

XMLInputFactory.createXMLStreamReader(java.io.Reader)

+1

Source: https://habr.com/ru/post/1753093/


All Articles