Interpolating regular expressions to another regular expression

In the following code, k2 minimally different from k1 . That is, k2 is exactly the same, except that it is determined using interpolation. (That is, I expected it to be exactly the same: obviously, from the result of p k2 this is not so.)

 v  = /[aeiouAEIOUäöüÄÖÜ]/         # vowels k1 = /[[ßb-zB-Z]&&[^[aeiouAEIOUäöüÄÖÜ]]]/ # consonants defined without interpolation k2 = /[[ßb-zB-Z]&&[^#{v}]]/        # consonants defined same way, but with interpolation 

But, as shown below, using gsub with k1 works, and using it with k2 does not work, as I don't understand.

 all_chars = "äöüÄÖÜß"<<('a'..'z').to_a.join<<('A'..'Z').to_a.join p all_chars          # "äöüÄÖÜßabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" p all_chars.gsub( k1 , '_' ) # "äöüÄÖÜ_a___e___i_____o_____u_____A___E___I_____O_____U_____" p all_chars.gsub( k2 , '_' ) # "äöüÄÖÜ_abcdefghijklm_o_____u__x__ABCDEFGHIJKLMNOPQRSTUVWXYZ" p k1             # /[[ßb-zB-Z]&&[^[aeiouAEIOUäöüÄÖÜ]]]/ p k2             # /[[ßb-zB-Z]&&[^(?-mix:[aeiouAEIOUäöüÄÖÜ])]]/ 

Why is this not working? What is (?-mix:...) ? Is there any way to make this work the way I expected it to?

+1
source share
4 answers

I do things like:

 keywords = %w[foo bar] regex = /\b(?:#{ Regexp.union(keywords).source })\b/i # => /\b(?:foo|bar)\b/i 

This is useful if you want to check for multiple substrings within a single string at a time.

Interpolating a regular expression into a string will not necessarily work correctly. By default, when you do this, Ruby will convert the template using to_s , which I don’t want because I don’t want a full string representation of the template, flags and all. Using source returns what I want:

 regex = Regexp.union(keywords) regex # => /foo|bar/ regex.inspect # => "/foo|bar/" regex.to_s # => "(?-mix:foo|bar)" regex.source # => "foo|bar" 
+4
source

Use a string to hold these characters and interpolate them into regular expressions as needed. Ruby tries to cover some bases with (?mix:) , but it does not expect the regular expression to be part of the character set inside another regular expression.

Background Information

Here's what actually happens:

In many cases, if you interpolate a regular expression into a regular expression, this makes sense. Like this

 a = /abc/ #/abc/ b = /#{a}#{a}/ #/(?-mix:abc)(?-mix:abc)/ 'hhhhabcabchthth'.gsub(/abcabc/, '_') # "hhhh_hthth" 'hhhhabcabchthth'.gsub(b, '_') # "hhhh_hthth" 

It works as expected. The whole thing (?-mix: is a way to encapsulate the rules for a , just in case b has different flags. a case sensitive because this is the default. But if b was set to case insensitive, the only way for a continue matching with what he compared earlier is to make sure that it is case sensitive using -i . Everything inside (?-i:) after the colon will be case sensitive. This is made more clear from the following

 e = /a/i # e is made to be case insensitive with the /i /#{e}/ # /(?i-mx:a)/ 

You can see that when interpolating e into something, you now have (?i-mx:) . Now i is to the left of - , which means that it turns off case insensitivity, and is not turned off (temporarily) so that e matches as usual.

In addition, in order to avoid spoiling the capture order, it is added (?: To create an unoccupied group. All this is a crude attempt to turn the variables a and e into line with what you expect from them when you insert them into a larger regular expression.

Unfortunately, if you put it in coincidence with the character set, that is, [] , this strategy completely fails. [(?-mix:)] now interpreted in completely different ways. [^?-m] indicates everything that is NOT between "?" and "m" (inclusive), which means, for example, that the letter "c" is no longer in your character set. This means that "c" is not replaced by an underscore, as you see in your example. You can see the same thing as the letter "x". It is also not replaced by an underscore, because it is inside a set of characters with a negation and therefore does not match matching characters.

Ruby doesn't bother with parsing a regular expression to understand that you are interpolating your regular expression into a character set, and even if that were the case, you would still have to parse the variable v to find out that it is also a character set, and therefore all that what you really need to do is take the characters from the character set in v and put them with all the other characters.

My advice is that since aeiouAEIOUäöüÄÖÜ is just a bunch of characters, you can save it in a string and interpolate into any character set in a regular expression. And be careful to interpolate regex into regex in the future. Avoid this if you are really not sure what he is going to do.

0
source

Answer I use:

If you want to interpolate some_regex into another, use regex1.inspect[1...-1] inside #{} .

For example, taking my original example, this way of determining consonants using interpolation works.

 v = /[aeiouAEIOUäöüÄÖÜ]/ # vowels k3 = /[[ßb-zB-Z]&&[^#{v.inspect[1...-1]}]]/ # consonants 

(I do not know if there is any built-in way to execute the same function as .inspect[1...-1] for regular expressions.

I was surprised that .to_s for regular expressions still does not work.

I'm still not sure why "(?-mix: some_regex )" .)

-2
source

Your statement "exactly the same, except that it is determined by interpolation" is incorrect.

When you interpolate something that is not a string, like regex v , it is put in a string with to_s .

 v = /[aeiouAEIOUäöüÄÖÜ]/ v.to_s # => "(?-mix:[aeiouAEIOUäöüÄÖÜ])" 

This is interpolated in k2 , which leads to another regular expression from k1 . If you want k2 to be the same as k1 , you need to interpolate the line:

 v = "[aeiouAEIOUäöüÄÖÜ]" 
-3
source

Source: https://habr.com/ru/post/1265983/


All Articles