Should the end of the tag close all unclosed intermediate start tags with missing end tags?

Am I reading the wrong HTML 4.01 or Google code? In HTML 4.01, if I write:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <body>plain <em>+em <strong>+strong </em>-em 

Google Chrome renders:

plain + em + strong - em

This, apparently, contradicts the HTML 4.01 standard, which summarizes the basic SGML rules: “the end of the tag closes, returns to the corresponding start tag, all unclosed intermediate start tags with missing end tags”. No.

That is, the end tag </em> should close not only the start tag <em> , but also the open start tag <strong> , and the rendering should be:

plain + em + strong -em

The comment noted that it remains bad practice to leave tags open, but this is just an academic example. No less good example: <em> +em <strong> +strong </em> -em </strong> . I understood from the HTML 4.01 standard that this piece of code would not work properly due to overlapping elements: the end tag </em> must implicitly close <strong> . The fact that he was working as intended was unexpected, and that is what led to my question.

And it turned out that I suggested a false dichotomy in the question: neither Google nor I read the HTML 4.01 standard correctly. Private correspondent w3.org pointed me to Web SGML and HTML 4.0. Explained Martin Bryan, explaining that “[t] it parses the program automatically closes any currently open inline element that was declared as an invalid end element, tags , when it encounters an end tag for a higher level element. ( If inline an element whose end cannot be skipped remains open, but the program will report an encoding error .) " ² (emphasis added). Summing up the Bryans SGML standard is correct, and summarizing HTML 4.01 is incorrect.

+4
source share
6 answers

The operation specified in the HTML 4.01 specification is very unclear or simply incorrect on all accounts. HTML 4.01 has specific rules for excluding end tags, and these rules are element-specific. For example, the end tag of an element p can be omitted, the end tag em never be omitted. The statement in the specification is probably trying to say that the end tag implicitly closes any internal elements that have not yet been closed until the end tag is excluded.

No browser has ever implemented HTML 4.01 (or any previous HTML specification), as defined, with SGML functions that formally form part of this. Everything that the HTML specifications say about SGML should be understood as theoretical until proven otherwise.

HTML5 does not change the rules of the game in this regard, except that it writes down error handling rules. For simple questions like these, the rules simply make normal browser behavior the norm. They are tag-oriented, more or less treat tags as formatting commands: <em> means "italics", </em> means "stop italics", etc. But HTML5 is also taking steps to better define error handling, so despite this tag using soup, it’s clearly defined which document tree in the DOM will be created.

+4
source

Some tags may be omitted (for example, an end tag for <p> or a start and end tag for <body> ), and some may not (for example, an end tag for <strong> ). This is the first that is mentioned in the specification section that you are quoting. You can identify them using the dash in the DTD :

 <!ELEMENT P - O (%inline;)* -- paragraph --> ^A p element ^ requires a start tag ^ has optional end tag ^ contains zero or more inline things ^ Comment: Is a paragraph 

What you have is not an HTML document with a missing tag, but an invalid pseudo-HTML document in which browsers try to perform error recovery.

The specification (for HTML 4) does not describe how to perform error recovery left to browsers.

+6
source

All modern browsers use the HTML5 parser (even for HTML 4.01 content), so HTML5 parsing rules apply. For more information, see Parsing HTML documents in the HTML5 specification .

HTML structure

  • HTML
    • HEAD
      • #text "()
    • ORGAN
      • #text "plain" ()
      • Em
        • #text "+ em" (italics)
        • STRONG
          • #text "+ strong" (bold / italics)
      • STRONG
        • #text "-em" (bold)
+1
source

The specification states that:

Some HTML element types allow authors to omit end tags (eg, the P and LI element types).

It:

Please consult the SGML standard for information about rules governing elements (eg, they must be properly nested, an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags (section 7.5.1), etc.).

Applies to elements that may have missing end tags.

If you look at P , you will see:

Start tag: required ; end tag: optional

So when you use this:

 <DIV> <P>This is the paragraph. </DIV> 

The P element will be automatically closed.

But if you look at the EM spec, you will see:

Start tag: required ; end tag: required

So, this auto close rule is invalid because HTML is invalid.

Curiously, all browsers presented the same behavior with such invalid HTML.

+1
source

If you try to run your HTML code through http://validator.w3.org/check , it will mark this HTML code as invalid.

If your HTML is invalid, all bids are disabled, and different browsers may display your HTML in different ways.

0
source

If you look at the DOM in Chrome by right-clicking and specifying a validation element, you can conclude that since your tags do not match, it uses an algorithm to decide where you messed up. Technically, it closes a strong tag in the right place. However, it decides that you probably tried to make both parts of the text bold, so it puts the last -em in a completely new, extra “strong” element, while retaining the “+ strong” element in it’s own “strong” element. It seems to me that the chrome team decided that it is statistically likely that you want both things to be bold.

0
source

Source: https://habr.com/ru/post/1389751/


All Articles