I need to parse invalid HTML files containing multiple random elements (e.g. BODY) in random lines throughout the file. I tried parsing it as XML, but no luck, as this file also has an invalid XML structure (many invalid attributes in random elements above the file). HtmlAgilityPack was also unable to read this file. This is only reading the file before the first incorrect element and nothing after it.
Here is a small example of such a file:
<HTML> <HEAD> <TITLE>My title</TITLE> </HEAD> <BODY leftmargin=9 topmargin=7 > <TABLE> <TR> <TD>Test</TD> </TR> <TR> <TD>Test</TD> <TD>Test<TD> </TR> <BODY> <-- This is the point where HtmlAgilityPack is stuck --!> <TR> <TD>Test</TD> <TD>Test</TD> </TR> <TR> </BODY> <TR> <TD><FONT>Test</FONT></TD> </TR> </TABLE> </BODY>
I am trying to parse the information from this table.
source share