Checking the source of Firefox and reading the relevant documentation will help get some initial insight.
After the document is extracted, it is passed to the parser (see / mozilla / parser / html / ), which chews the document and generates content trees. The central parts of the analyzer are written in Java ( / mozilla / parser / html / javasrc / ) and then translated into C ++ to build, so be prepared to have a good time when you want to read the rest of the source.
Looking at the source of the parser ( /mozilla/parser/html/javasrc/TreeBuilder.java ), namely an excerpt from the startTag function:
1579 if (errorHandler != null) { 1580 // ID uniqueness 1581 @IdType String id = attributes.getId(); 1582 if (id != null) { 1583 LocatorImpl oldLoc = idLocations.get(id); 1584 if (oldLoc != null) { 1585 err("Duplicate ID \u201C" + id + "\u201D."); 1586 errorHandler.warning(new SAXParseException( 1587 "The first occurrence of ID \u201C" + id 1588 + "\u201D was here.", oldLoc)); 1589 } else { 1590 idLocations.put(id, new LocatorImpl(tokenizer)); 1591 } 1592 } 1593 }
Paying attention to line 1590 and bearing in mind that earlier in the same file we have:
459 private final Map<String, LocatorImpl> idLocations = new HashMap<String, LocatorImpl>();
We see that node identifiers are stored in a simple hash map. Finding out how classes are handled is an exercise left to the reader.
Various DOM methods, such as document.getElementByID(...) , are associated with this hash map through the glue code and many hierarchies of objects, see How does the DOM representative website work? at ask.mozilla.org .
source share