How does querySelector work under the hood?

Everyone knows what DOM selectors, such as document.getElementByID(...) and document.querySelector(...) , do and how you can use it with classes, attributes, identifier, etc.

But I could not find how it works under the hood (I can find a comparison of perfection tests , but I'm interested in theory), I know that the html page is loaded, parsed by the browser and the DOM tree is created. But how each of the selectors crosses the DOM tree to find the elements.

I reviewed the spec for the parsing algorithm and read a really good explanation of how browsers work , but also gives excellent explanations regarding HTML, CSS processing and rendering flow, it does not explain how each of these selectors goes around this tree to find elements .

I assume that in order to find something like .black or span , he needs to cross the entire tree, but to find #id , he can cross some additional data structure and thereby make it much faster. Please do not write your assumptions, I am looking for specific knowledge with backup before specification or implementation in some browsers.

+6
source share
1 answer

Checking the source of Firefox and reading the relevant documentation will help get some initial insight.
After the document is extracted, it is passed to the parser (see / mozilla / parser / html / ), which chews the document and generates content trees. The central parts of the analyzer are written in Java ( / mozilla / parser / html / javasrc / ) and then translated into C ++ to build, so be prepared to have a good time when you want to read the rest of the source.

Looking at the source of the parser ( /mozilla/parser/html/javasrc/TreeBuilder.java ), namely an excerpt from the startTag function:

 1579 if (errorHandler != null) { 1580 // ID uniqueness 1581 @IdType String id = attributes.getId(); 1582 if (id != null) { 1583 LocatorImpl oldLoc = idLocations.get(id); 1584 if (oldLoc != null) { 1585 err("Duplicate ID \u201C" + id + "\u201D."); 1586 errorHandler.warning(new SAXParseException( 1587 "The first occurrence of ID \u201C" + id 1588 + "\u201D was here.", oldLoc)); 1589 } else { 1590 idLocations.put(id, new LocatorImpl(tokenizer)); 1591 } 1592 } 1593 } 

Paying attention to line 1590 and bearing in mind that earlier in the same file we have:

 459 private final Map<String, LocatorImpl> idLocations = new HashMap<String, LocatorImpl>(); 

We see that node identifiers are stored in a simple hash map. Finding out how classes are handled is an exercise left to the reader.

Various DOM methods, such as document.getElementByID(...) , are associated with this hash map through the glue code and many hierarchies of objects, see How does the DOM representative website work? at ask.mozilla.org .

+6
source

Source: https://habr.com/ru/post/974951/


All Articles