" Sometimes I get the following error when I try to parse an XML ...">

The element type "META" must be terminated with the corresponding end tag "</META>"

Sometimes I get the following error when I try to parse an XML file from Java (on the GAE server):

Parse: org.xml.sax.SAXParseException; lineNumber: 10; columnNumber: 3; The element type "META" must be terminated by the matching end-tag "</META>". 

But this does not happen all the time, sometimes it works well. The program parses xml files, and I have no problem with them.

This is the XML file I'm trying to parse: http://www.fulhamchronicle.co.uk/london-chelsea-fc/rss.xml

Any help would be appreciated. Thank you


Update:

Thanks for the answer. I changed my code to a different parser and the good news that the file now processes correctly. It’s bad that now he switched to another problem with the same problem, on the same line, despite a completely different feed, and it worked fine before. Can anyone think why this is happening?

+8
source share
5 answers

It appears to be a living document; that is, one that changes quite often. There is no <meta> .

I can provide two explanations of what is happening:

  • Sometimes a document is created or is created incorrectly.

  • Sometimes you get a page with an HTML error instead of the expected document, and the XML parser cannot handle the <meta> in the HTML <head> . This is because the <meta> in the (valid) HTML does not need to have a matching / closing </meta> . (And, at least for some versions of HTML, a closing tag is not allowed.)

In order to keep track of this, you will need to fix the exact input, due to which the parsing will not complete.

+5
source

You can try <meta/> instead of <meta> .

+4
source

This is not XML, but HTML:

< ! DOCTYPE html PUBLIC "- // W3C // DTD HTML 4.01 // EN" " http://www.w3.org/TR/1999/REC-html401-19991224/strict.dtd ">

The XML parser will not parse it.

I see that the file has no content and it does not look like a valid RSS file. Any server-side error may occur.

0
source

You can use this tag

 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
0
source

just apply ( / ) after each line with meta

 <meta name=" " content=" " /> 

using,

 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> 

and really it works

0
source

Source: https://habr.com/ru/post/944894/


All Articles