Xhtml Invalid Characters?

I made my own xhtml valdidator in .NET (checking with dtd + some additional rules), and I noticed a mismatch between my validation and w3c validation.

In my validator, I get the following error when id has a colon (say: id = "mustang: horse")

(Error) The attribute 'id' has an invalid value according to its data type.

But I am not getting any w3c errors for this template.

I tried to find a list of invalid characters for an attribute in xml / xhtml but could not find it?

Thank you for your help.

+1
source share
2 answers

The reason for this difference is that the W3C validator does not seem to perform XHTML processing with the namespace. Although XHTML documents should be in the XHTML namespace, this is actually reasonable because HTML documents do not use namespaces, and the normative legal structure of XHTML documents (like HTML) is defined by a DTD file, and DTDs are not really a namespace .

As @Alochi already noted:

Type identifier values โ€‹โ€‹MUST match the name production.

This is true when the document is parsed as not a namespace, but it is not true if the document must match the namespace. Namespaces in the XML specification indicate that Identifiers must conform to the NCName statement , which explicitly prohibits the colon character. Known namespace parsing is a general convention, and therefore the use of a colon in the id value is not recommended, even if it is allowed when parsing the document is not a namespace.

Summary: if namespaces are ignored, the ID value must be a valid Name and may contain a colon; otherwise, it must be a valid NCName and cannot contain a colon.

+1
source

There is a list and it allows colons.

The XHTML 1.0 specification is written at http://www.w3.org/TR/xhtml1/#h-4.10

... in XHTML 1.0, the id attribute is defined as an identifier of type ...

The XML 1.0 specification talks about http://www.w3.org/TR/2008/REC-xml-20081126/#id

The values โ€‹โ€‹of the type identifier MUST match the production of Name.

And the name of the product is defined in http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name

  [4] NameStartChar :: = ":" |
                        [AZ] |  "_" |  [az] |  [# xC0- # xD6] |
                        [# xD8- # xF6] |  [# xF8- # x2FF] |
                        [# x370- # x37D] |  [# x37F- # x1FFF] |
                        [# x200C- # x200D] |  [# x2070- # x218F] |
                        [# x2C00- # x2FEF] |  [# x3001- # xD7FF] |
                        [# xF900- # xFDCF] |  [# xFDF0- # xFFFD] |
                        [# x10000- # xEFFFF]

   [4a] NameChar :: = NameStartChar |  "-" |  "."  |
                        [0-9] |  # xB7 |  [# x0300- # x036F] |
                        [# x203F- # x2040]           

   [5] Name :: = NameStartChar (NameChar) *

And also said above is this formal definition:

Authors of the document are encouraged to use names that are meaningful words or combinations of words in natural languages โ€‹โ€‹and to avoid symbolic or space characters in names. Note that COLON , HYPHEN-MINUS, FULL STOP (period), LOW LINE (underline) and MIDDLE DOT are explicitly permitted.

(My emphasis)

+3
source

Source: https://habr.com/ru/post/1382021/


All Articles