Javascript Regex: combine text NOT part of HTML tag

I would really like Regex to be executed in node.js (so there is no jQuery DOM processing, etc., because tags can have different nesting), which corresponds to all text that is NOT an HTML tag or part of it in separate groups .

eg. I would like to combine "5", "ELT", "SPR", "", "plo", "Unterricht", "", "& nbsp" and "plo" from this line:

<tr class='list even'> <td class="list" align="center" style="background-color: #FFFFFF" > <span style="color: #010101">5</span> </td> <td class="list" align="center" style="background-color: #FFFFFF" > <b><span style="color: #010101">ELT.</span></b> </td> <td class="list" align="center" style="background-color: #FFFFFF" > <b><span style="color: #010101">SPR</span></b> </td> <td class="list" style="background-color: #FFFFFF" >&nbsp;</td> <td class="list" align="center" style="background-color: #FFFFFF" > <strike><span style="color: #010101">pio</span></strike> </td> <td class="list" align="center" style="background-color: #FFFFFF" > <span style="color: #010101">Unterricht</span> </td> <td class="list" style="background-color: #FFFFFF" >&nbsp;</td> <td class="list" style="background-color: #FFFFFF" >&nbsp;</td> <td class="list" align="center" style="background-color: #FFFFFF" > <b><span style="color: #010101">pio</span></b> </td> </tr> 

I can assure that there will be no ">" in the tags.

The solution I found was (?<=^|>)[^><]+?(?=<|$) , But this will not work in node.js (possibly because lookaheads: "Invalid group ")

Any suggestions? (and yes, I really think Regex is the right way, because html can be nested in other ways, and the content always has the same order as the table)

+6
source share
2 answers

Try "yourhtml'.replace (/ (<[^>] *>) / g, '')

  '<tr class = "list even"> <td class = "list" align = "center" style = "background-color: #FFFFFF"> <span style = "color: # 010101"> 5 </span> < / td> <td class = "list" align = "center" style = "background-color: #FFFFFF"> <b> <span style = "color: # 010101"> ELT. </span> </b> </td> <td class = "list" align = "center" style = "background-color: #FFFFFF"> <b> <span style = "color: # 010101"> SPR </span> </b> </td> <td class = "list" style = "background-color: #FFFFFF"> </td> <td class = "list" align = "center" style = "background-color: #FFFFFF"> < strike> <span style = "color: # 010101"> pio </span> </strike> </td> <td class = "list" align = "center" style = "background-color: #FFFFFF"> < span style = "color: # 010101"> Unterricht </span> </td> <td class = "list" style = "background-color: #FFFFFF"> </td> <td class = "list" style = "background-color: #FFFFFF"> </td> <td class = "list" align = "center" style = "background-color: #FFFFFF"> <b> <span style = "color: # 010101"> pio </span> </b> </td> </tr> '.replace (/ (<[^>] *>) / g,' ')

It will provide the delimited text that you want to match (which you can divide by space).

+3
source

Perhaps you can split directly using the tags themselves:

 html.split(/<.*?>/) 

Then you must remove the empty lines from the result.

+2
source

Source: https://habr.com/ru/post/897978/


All Articles