Best way to parse HTML to XML

Essentially, I currently have an iPhone app that can query and parse an XML file on my server. Right now, I now have to manually update and upload my XML file every morning so that my users can get updated information. I would like to automate this process, which will entail the analysis of various websites (NYTimes, iAmBored.com, etc.), Outputting the relevant information from each of these sites to an XML file and uploading this file to my server.

Does anyone know how to do this (parsing HTML to XML file). Since I'm new, I'm not sure which languages ​​it requires, or what is the best way to do this?

Many thanks!

+3
source share
4 answers

You can try translating HTML to XHTML (XHTML is XML based, so it is XML with some rules defined in DTD).

You can also try to parse HTML directly using SGML parsing (since XHTML is XML-based, HTML is SGML-based).

Links are provided as inspiration.

+1
source

If the content you want to clear is in XHTML, you can easily use the XSLT language to transform the original content into what you need inside the XML that you provide to your users.

XML- , .. XPath , , .

0

, RSS/Atom? , XML, HTML XML. , , RSS-, HTML, , , HTML.

XSLT - , , XML, , XML- .

0

alt textTagSoup - Just Keep On Truckin ' alt text

... SAX- , Java XML, HTML : , , .

TagSoup , .

SAX , XML , HTML. TagSoup , HTML HTML XML, XHTML.

, Taggle, TagSoup ++,

0

Source: https://habr.com/ru/post/1772355/


All Articles