HXT ignores HTML DTD, replacing it with XML DTD

I am having trouble figuring out why HXT is replacing my DTDs. Firstly, here is my input file for analysis:

<!DOCTYPE html> <html> <head> <title>foo</title> </head> <body> <h1>foo</h1> </body> </html> 

and this is the result that I get:

 <?xml version="1.0" encoding="US-ASCII"?> <html> <head> <title>foo</title> </head> <body> <h1>foo</h1> </body> </html> 

Finally, here is a simplified version of the arrows that I use:

 start (App src dest) = runX $ readDocument [ withValidate no , withSubstDTDEntities no , withParseHTML yes --, withTagSoup ] src >>> this >>> writeDocument [ withIndent yes , withSubstDTDEntities no , withOutputHTML --, withOutputEncoding "UTF-8" ] dest 

I apologize for the comments - I played with various combinations of configurations. I just can't get HXT to not interfere with DTD, even with withSubstDTDEntities no , withValidate no , etc. I get a warning that HXT is ignoring my doctype declaration, but this is the only bit of discernment I have, Can someone please lend me a hand? Thank you in advance!

+5
source share
1 answer

You have two problems.

HXT accepts only one of the following three html docs

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "DTD/xhtml1-frameset.dtd"> 

Using one of them, you will get rid of the warning about ignoring dtd.

Second, add the following writeDocument parameter

 withAddDefaultDTD yes 
+4
source

Source: https://habr.com/ru/post/1206271/


All Articles