Javascript regEx: wrapping words and spaces in tags

I tried to achieve this: I want to wrap the words in a tag and spaces (which may be several) in the tag, assuming the source text may contain html tags that should not be perforated

This is <b>very bold</b> word. 

convert to →

 <w>This</w><s> </s><w>is</w><s> </s><b><w>very</w><s> </s><w>bold</w></b><s> </s><w>word</w> 

What is the correct regEx to achieve this?

+4
source share
2 answers

You should use two replacements →

 s.replace(/([^\s<>]+)(?:(?=\s)|$)/g, '<w>$1</w>').replace(/(\s+)/g, '<s>$1</s>') 

Check this demo .


EDIT

For more complex inputs (depending on your comment below) go to →

 s.replace(/([^\s<>]+)(?![^<>]*>)(?:(?=[<\s])|$)/g, '<w>$1</w>').replace(/(\s+)(?![^<>]*>)/g, '<s>$1</s>'); 

Check this demo .

+1
source

Regular expressions are not suitable for every task. If your string can contain arbitrary HTML, then it is impossible to handle all cases using regular expressions, since HTML is a context-free language, and regular expressions cover only a subset of them. Now, before messing with loops and code loading for this, let me suggest the following:

If you are in a browser environment or have access to the DOM library, you can put this line in a temporary DOM element, then work with text nodes, and then read the line back.

Here is an example of using lib that I wrote a month and updated now called Linguigi

 var element = document.createElement('div'); element.innerHTML = 'This is <b>very bold</b> word.'; var ling = new Linguigi(element); ling.eachWord(true, function(text) { return '<w>' + text + '</w>'; }); ling.eachToken(/ +/g, true, function(text) { return '<s>' + text + '</s>'; }); alert(element.innerHTML); 

Example: http://prinzhorn.github.com/Linguigi/ (click Stackoverflow 12758422 )

0
source

Source: https://habr.com/ru/post/1438142/


All Articles