RegExp. Get only the text content of the tag (without internal tags)
I have a line with html code.
<h2 class="some-class"> <a href="#link" class="link" id="first-link" <span class="bold">link</span> </a> NEED TO GET THIS </h2> I need to get only h2 text content. I create this regex:
(?<=>)(.*)(?=<\/h2>) But useful if h2 has no internal tags. Otherwise, I get the following:
<a href="#link" class="link" id="first-link" <span class="bold">link</span> </a> NEED TO GET THIS Rgex is not suitable for parsing HTML, but if your html is invalid or in any way you want to use a regex:
(?!>)([^><]+)(?=<\/h2>) Getting the latest texts before the closing tag
</h2>(IF EXISTS)To avoid
nullresults,*were changed to+.This regex is completely limit and suitable for limited situations as the mentioned question.
Never use a regular expression to parse HTML, check out these well-known answers:
Using Regular Expressions for HTML Parsing: Why Not?
RegEx matches open tags, with the exception of standalone XHTML tags
Instead, create a temporary element with text as HTML and get the content by filtering out text nodes.
var str = `<h2 class="some-class"> <a href="#link" class="link" id="first-link" <span class="bold">link</span> </a> NEED TO GET THIS </h2>`; // generate a temporary DOM element var temp = document.createElement('div'); // set content temp.innerHTML = str; // get the h2 element var h2 = temp.querySelector('h2'); console.log( // get all child nodes and convert into array // for older browser use [].slice.call(h2...) Array.from(h2.childNodes) // iterate over elements .map(function(e) { // if text node then return the content, else return // empty string return e.nodeType === 3 ? e.textContent.trim() : ''; }) // join the string array .join('') // you can use reduce method instead of map // .reduce(function(s, e) { return s + (e.nodeType === 3 ? e.textContent.trim() : ''); }, '') ) Link: