Regular expressions are not suitable for every task. If your string can contain arbitrary HTML, then it is impossible to handle all cases using regular expressions, since HTML is a context-free language, and regular expressions cover only a subset of them. Now, before messing with loops and code loading for this, let me suggest the following:
If you are in a browser environment or have access to the DOM library, you can put this line in a temporary DOM element, then work with text nodes, and then read the line back.
Here is an example of using lib that I wrote a month and updated now called Linguigi
var element = document.createElement('div'); element.innerHTML = 'This is <b>very bold</b> word.'; var ling = new Linguigi(element); ling.eachWord(true, function(text) { return '<w>' + text + '</w>'; }); ling.eachToken(/ +/g, true, function(text) { return '<s>' + text + '</s>'; }); alert(element.innerHTML);
Example: http://prinzhorn.github.com/Linguigi/ (click Stackoverflow 12758422 )
source share