I use javascript and want to go through the HTML tree, getting all the text as it seems to the user. However, I am losing distance information.
Let's say I have two documents:
<html>XXX<p>YY YY</p><html> <html>XXX<p>YY YY</p><html>
The first will appear with 1 space between Ys. The second will have 3 spaces. However, if I cross the tree and for each #text node use:
text = node.nodeValue;
then the text for both nodes will have 3 spaces. I no longer know which one has the "real" nbsp spaces. I can use node.innerHTML for p elements that nbsp will show, but I don't think I can use innerHTML to get only XXX text (without any text subtraction).
I could just get innerHTML of the whole document and parse it. However, I also need to get the computed style of each element that I'm going to use.
window.getComputedStyle(theElement).getPropertyValue("text-align");
So, I will go through each node. In addition, innerHTML shows the source as it is, while crawling nodes it "fixes" html errors by adding end tags, etc. This is a good thing and something that I would like to preserve.
source share