RegExp. Get only the text content of the tag (without internal tags)

I have a line with html code.

<h2 class="some-class"> <a href="#link" class="link" id="first-link" <span class="bold">link</span> </a> NEED TO GET THIS </h2> 

I need to get only h2 text content. I create this regex:

 (?<=>)(.*)(?=<\/h2>) 

But useful if h2 has no internal tags. Otherwise, I get the following:

  <a href="#link" class="link" id="first-link" <span class="bold">link</span> </a> NEED TO GET THIS 
+5
source share
3 answers

Rgex is not suitable for parsing HTML, but if your html is invalid or in any way you want to use a regex:

 (?!>)([^><]+)(?=<\/h2>) 

try the demo

  • Getting the latest texts before the closing tag </h2> (IF EXISTS)

  • To avoid null results, * were changed to + .

  • This regex is completely limit and suitable for limited situations as the mentioned question.

0
source

Never use a regular expression to parse HTML, check out these well-known answers:

Using Regular Expressions for HTML Parsing: Why Not?

RegEx matches open tags, with the exception of standalone XHTML tags


Instead, create a temporary element with text as HTML and get the content by filtering out text nodes.

 var str = `<h2 class="some-class"> <a href="#link" class="link" id="first-link" <span class="bold">link</span> </a> NEED TO GET THIS </h2>`; // generate a temporary DOM element var temp = document.createElement('div'); // set content temp.innerHTML = str; // get the h2 element var h2 = temp.querySelector('h2'); console.log( // get all child nodes and convert into array // for older browser use [].slice.call(h2...) Array.from(h2.childNodes) // iterate over elements .map(function(e) { // if text node then return the content, else return // empty string return e.nodeType === 3 ? e.textContent.trim() : ''; }) // join the string array .join('') // you can use reduce method instead of map // .reduce(function(s, e) { return s + (e.nodeType === 3 ? e.textContent.trim() : ''); }, '') ) 

Link:

Fastest way to convert JavaScript NodeList to an array?

+2
source

demo

 var h2 = document.querySelector('h2') var h2_clone = h2.cloneNode(true) for (let el of h2_clone.children) { el.remove() } alert(h2_clone.innerText) 
+1
source

Source: https://habr.com/ru/post/1265027/


All Articles