Java reads a strange character at the beginning of a file that does not exist

I have a simple xml file on my hard drive. When I open it with notepad ++, this is what I see:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <content> ... more stuff here ... </content> 

But when I read it using FileInputStream , I get:

 ?<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <content>... 

I use JAXB for parsing xml and this excludes "content that is not allowed in the prolog" because of this ?? sign.

What is this "?" sign? why is this and how can i get rid of it?

+4
source share
6 answers

This extra character is an order byte , a special Unicode character code that allows the XML parser to know what the byte order (small end or big endian) of the bytes in the file is.

Usually your XML parser should understand this. (If this is not the case, I would consider the error in the XML parser).

As a workaround, make sure that the program creating this XML is not included in the specification.

+7
source

Check the file encoding, I saw a similar thing, relying on the file in most editors, and it looked great, it turned out that it was encoded with UTF-8 without specification (or with, I can’t remember at the top of the head). Notepad ++ should switch between them normally.

+2
source

You can use Notepad ++ to see all the characters from the menu View > Show Symbols > Show All Characters . It will show you the extra bytes present at the beginning. This is likely to be a byte byte character. If the extra bytes are indeed a byte, this approach would not help. In this case, you will need to download the hex editor or install Cygwin, follow the instructions in the last paragraph of this answer. After you see the file as hexadecimal codes, find the first two characters. They have one of the codes mentioned at http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

If they are really marked with a byte or if you cannot determine the cause of the error, try this:

From the menu, select Encoding > Encoding in UTF-8 without BOM , and then save the file.

(On Linux, you can use command-line tools to verify that at the beginning, for example, xxd -g1 filename | head or od -t cx1 filename | head .)

+1
source

You may have a new line. Remove it.

Choose View > Show Symbol > Show All Characters in Notepad ++ to see what happens.

0
source

this is not a jaxb problem, the problem is how you use xml to read ... try using input stream

 ... Unmarshaller u = jaxbContext.createUnmarshaller(); XmlDataObject xmlDataObject = (XmlDataObject) u.unmarshal(new FileInputStream("foo.xml")); ... 
0
source

Next to FileInputStream, ByteArrayInputStream also worked:

 JAXB.unmarshal(new ByteArrayInputStream(string.getBytes("UTF-8")), Delivery.class); 

=> You no longer need to send errors.

0
source

Source: https://habr.com/ru/post/1395042/


All Articles