Javascript regex for tokenize request

Hi, I came across a problem related to regular expressions which I cannot solve.

I need to tokenize the request (split the request into parts), suppose, as an example:

These are the separate query elements "These are compound composite terms" 

Ultimately, I need to have an array of 7 tokens:

 1) These 2) are 3) the 4) separate 5) query 6) elements 7) These are compound composite term 

The seventh token consists of a few words because it was inside double quotes.

My question is: is it possible to tokenize the input string according to the above explanations using one regex ?

Edit

I was curious about being able to use Regex.exec or similar code instead of split when achieving the same, so I did some investigation, followed by another question here . And since another answer to the question, you can use the following regular expression:

 (?:")(?:\w+\W*)+(?:")|\w+ 

When using the following scenario using a single liner:

 var tokens = query.match(/(?:")(?:\w+\W*)+(?:")|\w+/g); 

Hope this will be helpful ...

+6
source share
4 answers

You can use this regex:

 var s = 'These are the separate query elements "These are compound composite term"'; var arr = s.split(/(?=(?:(?:[^"]*"){2})*[^"]*$)\s+/g); //=> ["These", "are", "the", "separate", "query", "elements", ""These are compound composite term""] 

This regular expression will be split into spaces if they are outside double quotes, using lookahead to make sure there is an even number of quotes after the space.

+5
source

You can use a simpler approach to split the string and capture substrings inside double quotes, and then get rid of empty array elements using the clean function:

 Array.prototype.clean = function() { for (var i = 0; i < this.length; i++) { if (this[i] == undefined || this[i] == '') { this.splice(i, 1); i--; } } return this; }; var re = /"(.*?)"|\s/g; var str = 'These are the separate query elements "These are compound composite term"'; var arr = str.split(re); alert(arr.clean()); 
+2
source

You can get everything between one quote and the next ".*?" , or anything that is not a \S+ space:

 var re = /".*?"|\S+/g, str = 'These are the separate query elements "These are compound composite term"', m, arr = []; while ( m = re.exec( str ) ){ arr.push( m[0] ); } alert( arr.join('\n') ); 
+2
source
 \s(?=[^"]*(?:"[^"]*")*[^"]*$) 

You can split it up. See the demo.

https://www.regex101.com/r/fJ6cR4/20

+1
source

Source: https://habr.com/ru/post/987507/


All Articles