HTML 4.01 specs about hexadecimal symbolic links
In numeric symbolic links, the position of the character code in the character set of the document is indicated.
So, if the document character set encoding is UTF-8, numeric references must indicate a Unicode code point.
HTML5 specification for hexadecimal character references
The ampersand should be followed by the character U + 0023 NUMBER SIGN (#), followed by either the U + 0078 LATIN SMALL LETTER X character (x) or the character U + 0058 LATIN CAPITAL LETTER X (X), which should then be followed by one or more digits in the range U + 0030 DIGIT ZERO (0) - U + 0039 DIGIT NINE (9), U + 0061 LATIN SMALL LETTER A to U + 0066 LATIN SMALL LETTER F and U + 0041 LATIN CAPITAL LETTER A to U + 0046 LATIN CAPITAL A LETTER F representing an integer of sixteen which corresponds to a Unicode code point, which is permitted as defined below. Then the numbers must be followed by U + 003B SEMICOLON symbol (;).
The document character set is not mentioned, and it just says that the numeric value identifies the Unicode code point.
But it seems that all the modern browsers (I haven't tested older ones) treat & # x80; through & # x9F; as if they were referencing Windows-1252
For example, & # x80; displays € , but U + 0080 isn't the code point for € , U + 20AC is. And the Unicode code point for U + 0080 is defined as PAD
& # x20AC; also (correctly) displays € .
Is this just the pragmatic behavior of browsers, or is there an excuse for the spec I'm missing?
[Note that decimal references have the same behavior. I just used hexadecimal for clarity and consistency.]
source share