Javascript DOM, get text node without losing distance information

I use javascript and want to go through the HTML tree, getting all the text as it seems to the user. However, I am losing distance information.

Let's say I have two documents:

<html>XXX<p>YY YY</p><html> <html>XXX<p>YY&nbsp;&nbsp;&nbsp;YY</p><html> 

The first will appear with 1 space between Ys. The second will have 3 spaces. However, if I cross the tree and for each #text node use:

 text = node.nodeValue; 

then the text for both nodes will have 3 spaces. I no longer know which one has the "real" nbsp spaces. I can use node.innerHTML for p elements that nbsp will show, but I don't think I can use innerHTML to get only XXX text (without any text subtraction).

I could just get innerHTML of the whole document and parse it. However, I also need to get the computed style of each element that I'm going to use.

 window.getComputedStyle(theElement).getPropertyValue("text-align"); 

So, I will go through each node. In addition, innerHTML shows the source as it is, while crawling nodes it "fixes" html errors by adding end tags, etc. This is a good thing and something that I would like to preserve.

+6
source share
1 answer

What if you check charCode? I believe that the regular space is 32 , and &nbsp; - 160 .

+1
source

Source: https://habr.com/ru/post/910250/


All Articles