R - Sub Text Using Regular Expressions

@Greg Snow was kind enough to introduce me to pattern matching using regular expressions. I used his advice to do the following:

sql <- "SELECT a, b, (q + r) AS c, (s + t) AS d FROM tbl WHERE x=y" sql <- gsub("^.*SELECT *(.*?) +FROM.*$", "\\1", sql) "a, b, (q + r) AS c, (s + t) AS d" 

I was curious and tried to expand this logic to replace "anything after the decimal point with and including" AS ":

 sql<- gsub(" \\(.*AS", "\\1", sql) "a, b, d" 

I wanted him to return "a, b, c, d". However, I see what happens - it matches my pattern across the line, starting with a comma after "b" and ending with its second AS, not the first.

My question is, how can I match a pattern multiple times on the same line? I know that I am doing something wrong with the syntax.

+4
source share
1 answer

You already match several times - what gsub does, while sub matches only one.

The tasks are twofold. First, your regular expression is greedy. This is the default value and means something like .* Will match as much as possible, not as little as possible. You can make it non-greedy, as a result of which it will only match "(q + r) AS" and "(s + t) AS" instead of all this. Then, since you are already using gsub, the match will run multiple times.

Secondly, in fact, this is not a problem, it is simply superfluous. The second line says "\\1" , that is, "replace with the captured group number one". But there is no number one capture group! Instead, just use an empty string.

This should give you:

 sql<- gsub(" \\(.*?AS", "", sql) "a, b, c, d" 
+6
source

Source: https://habr.com/ru/post/1498061/


All Articles