Match everything except strings without quotes

I want to combine everything except strings without quotes.

I can match all the quoted lines with this: /(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/ Therefore I I tried to match everything except the lines with quotes: /[^(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))]/ , but this will not work.

I would like to use only regex because I want to replace it and want to get the quoted text after it.

 string.replace(regex, function(a, b, c) { // return after a lot of operations }); 

The recorded line for me is like this "bad line" or this 'cool line'

So if I enter:

 he\'re is "watever o\"k" efre 'dder\'4rdr'? 

It should output the following matches:

 ["he\'re is ", " efre ", "?"] 

And than I did not replace them.

I know that my question is very complicated, but it is not impossible! Nothing is impossible.

thanks

+1
source share
3 answers

EDIT: Rewritten to cover more cases of edges.

It can be done, but it is a bit complicated.

 result = subject.match(/(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+/g); 

will return

 , he said. , she replied. , he reminded her. , 

from this line (added lines and added quotes for clarity):

 "Hello", he said. "What up, \"doc\"?", she replied. 'I need a 12" crash cymbal', he reminded her. "2\" by 4 inches", 'Back\"\'slashes \\ are OK!' 

Explanation: (sort of, this is a bit legible)

Regular Expression Break:

 (?: (?= # Assert even number of (relevant) single quotes, looking ahead: (?: (?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])* ' (?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])* ' )* (?:\\.|"(?:\\.|[^"\\])*"|[^\\'])* $ ) (?= # Assert even number of (relevant) double quotes, looking ahead: (?: (?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])* " (?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])* " )* (?:\\.|'(?:\\.|[^'\\])*'|[^\\"])* $ ) (?:\\.|[^\\'"]) # Match text between quoted sections )+ 

Firstly, you can see that there are two similar parts. Both of these statements look like there is an even number of single / double quotes ahead of the line, not including screened quotes and quotes of the opposite kind. I will show it with a separate quote:

 (?= # Assert that the following can be matched: (?: # Match this group: (?: # Match either: \\. # an escaped character | # or "(?:\\.|[^"\\])*" # a double-quoted string | # or [^\\'"] # any character except backslashes or quotes )* # any number of times. ' # Then match a single quote (?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*' # Repeat once to ensure even number, # (but don't allow single quotes within nested double-quoted strings) )* # Repeat any number of times including zero (?:\\.|"(?:\\.|[^"\\])*"|[^\\'])* # Then match the same until... $ # ... end of string. ) # End of lookahead assertion. 

Some double quotes work the same way.

Then, at each position in the line where these two statements are successful, the next part of the regular expression actually tries to match something:

 (?: # Match either \\. # an escaped character | # or [^\\'"] # any character except backslash, single or double quote ) # End of non-capturing group 

Everything is repeated one or more times, how many times. The /g modifier ensures that we get all matches in the string.

Take a look in action here at RegExr .

+9
source

Here is a proven function that performs the trick:

 function getArrayOfNonQuotedSubstrings(text) { /* Regex with three global alternatives to section the string: ('[^'\\]*(?:\\[\S\s][^'\\]*)*') # $1: Single quoted string. | ("[^"\\]*(?:\\[\S\s][^"\\]*)*") # $2: Double quoted string. | ([^'"\\]*(?:\\[\S\s][^'"\\]*)*) # $3: Un-quoted string. */ var re = /('[^'\\]*(?:\\[\S\s][^'\\]*)*')|("[^"\\]*(?:\\[\S\s][^"\\]*)*")|([^'"\\]*(?:\\[\S\s][^'"\\]*)*)/g; var a = []; // Empty array to receive the goods; text = text.replace(re, // "Walk" the text chunk-by-chunk. function(m0, m1, m2, m3) { if (m3) a.push(m3); // Push non-quoted stuff into array. return m0; // Return this chunk unchanged. }); return a; } 

This solution uses the String.replace() method with replacing the callback function to β€œwalk” a section of a string across sections. The regular expression has three global alternatives: one for each section; $ 1: single quotes, $ 2: double quotes and $ 3: some substrings, each unquoted fragment is placed in the returned array. It correctly processes all escaped characters, including escaped quotes, both inside and outside quoted strings. Single quotes can contain any number of double quotes and vice versa. Illegal orphan quotes are removed and serve to split the uncycled section into two pieces. Please note that this solution does not require a search and requires only one pass. He also implements Friedl's "Unrolling-the-Loop" performance technique and is quite effective.

Optional: Below is the code to test the function with the original test string:

 // The original test string (with necessary escapes): var s = "he\\'re is \"watever o\\\"k\" efre 'dder\\'4rdr'?"; alert(s); // Show the test string without the extra backslashes. console.log(getArrayOfNonQuotedSubstrings(s).toString()); 
+1
source

You cannot invert a regular expression. What you tried to make from a character class from it and invert - but for this you will have to avoid all closing brackets "\]".

EDIT: I would start with

 /(^|" |' ).+?($| "| ')/ 

This corresponds to something between the beginning or end of a quoted line (very simple: quotation mark plus space), as well as the end of a line or beginning of line with quotation marks (space plus quotation mark). Of course, this does not handle any escape sequences or quotes that do not match the scheme / ['"].*['"] / . See the answers above for more detailed expressions :-)

-5
source

Source: https://habr.com/ru/post/1237494/


All Articles