I am trying to solve a problem with a program that generates XHTML using Haskell from UTF-8 text. The program accepts lines of this text and should create valid XHTML entities, but this is not the case. I import Text.XHtml.Transitional and use the href and identifier functions to generate URIs and identity attributes from UTF-8 strings. Using the Haskell interpreter, we can see:
Prelude Text.XHtml.Transitional> href "äöü" href="äöü"
This is a good and valid XHTML URI. Nonetheless,
Prelude Text.XHtml.Transitional> identifier "äöü" id="äöü"
is not, according to the specification, which does not allow '&', '#' and ';' characters. So it looks like Text.XHtml.Transitional lib is buggy. Moreover, I think that even XHMTL is bad because it does not give a 1: 1 mapping from UTF-8 in attributes and one that is identical to the mapping used for the URI.
Since I'm new to Haskell, I could make a mistake somewhere. In addition, I know that HTML5 relaxes these attribute restrictions. But that does not dominate. Is the buggy library? If so, which display should replace this one?
See also Invalid Xhtml Characters?
source share