How to embed regular expressions in other regular expressions in Ruby

I have a line:

'A Foo' 

and I want to find "Foo" in it.

I have a regex:

 /foo/ 

that I embed in another case-insensitive regular expression, so I can build the template in steps:

 foo_regex = /foo/ pattern = /A #{ foo_regex }/i 

But it will not match correctly:

 'A Foo' =~ pattern # => nil 

If I paste the text directly into the template, it works:

 'A Foo' =~ /A foo/i # => 0 

What's wrong?

+5
source share
2 answers

At first glance, it seems that embedding a template inside another template will just work, but based on a bad assumption about how templates work in Ruby, they are just strings. Using:

 foo_regex = /foo/ 

creates a regexp object:

 /foo/.class # => Regexp 

As such, he knows the additional flags used to create it:

 ( /foo/ ).options # => 0 ( /foo/i ).options # => 1 ( /foo/x ).options # => 2 ( /foo/ix ).options # => 3 ( /foo/m ).options # => 4 ( /foo/im ).options # => 5 ( /foo/mx ).options # => 6 ( /foo/imx ).options # => 7 

or if you like the binary:

 '%04b' % ( /foo/ ).options # => "0000" '%04b' % ( /foo/i ).options # => "0001" '%04b' % ( /foo/x ).options # => "0010" '%04b' % ( /foo/xi ).options # => "0011" '%04b' % ( /foo/m ).options # => "0100" '%04b' % ( /foo/mi ).options # => "0101" '%04b' % ( /foo/mx ).options # => "0110" '%04b' % ( /foo/mxi ).options # => "0111" 

and remembers them whenever Regexp is used, whether it's a separate template or built-in to another.

This can be seen in action if we look at how the template looks after implementation:

 /#{ /foo/ }/ # => /(?-mix:foo)/ /#{ /foo/i }/ # => /(?i-mx:foo)/ 

?-mix: and ?i-mx: :: how these options are presented in the built-in template.

According to the Regexp documentation for Options :

i , m and x can also be applied at the subexpression level with a (? on-off) construct that enables options and disables options for an expression enclosed in parentheses.

So, Regexp remembers these parameters, even inside the external template, forcing the common template to fail to match:

 pattern = /A #{ foo_regex }/i # => /A (?-mix:foo)/i 'A Foo' =~ pattern # => nil 

You can make sure that all subexpressions match their surrounding patterns, however this can quickly become too confusing or messy:

 foo_regex = /foo/i pattern = /A #{ foo_regex }/i # => /A (?i-mx:foo)/i 'A Foo' =~ pattern # => 0 

Instead, we have a source method that returns the text of a template:

 /#{ /foo/.source }/ # => /foo/ /#{ /foo/i.source }/ # => /foo/ 

The problem with the built-in template that remembers the parameters also appears when using other Regexp methods, such as union :

 /#{ Regexp.union(%w[ab]) }/ # => /(?-mix:a|b)/ 

and again source can help:

 /#{ Regexp.union(%w[ab]).source }/ # => /a|b/ 

Knowing all this:

 foo_regex = /foo/ pattern = /#{ foo_regex.source }/i # => /foo/i 'A Foo' =~ pattern # => 2 
+5
source

"what's wrong?"

Your guess about how Regexp interpolated is incorrect.

Interpolation through #{...} is done by calling to_s on the interpolated object:

 d = Date.new(2017, 9, 8) #=> #<Date: 2017-09-08 ((2458005j,0s,0n),+0s,2299161j)> d.to_s #=> "2017-09-08" "today is #{d}!" #=> "today is 2017-09-08!" 

not only for string literals, but also for regular expression literals:

 /today is #{d}!/ #=> /today is 2017-09-08!/ 

In your example, the object-to-be interpolated is Regexp :

 foo_regex = /foo/ 

And Regexp#to_s returns:

[...] regular expression and its parameters using notation (? opts: source) .

 foo_regex.to_s #=> "(?-mix:foo)" 

In this way:

 /A #{foo_regex}/i #=> /A (?-mix:foo)/i 

As well as:

 "A #{foo_regex}" #=> "A (?-mix:foo)" 

In other words: because of the way you implement Regexp#to_s you can interpolate the templates without losing your flags. This is a feature, not a mistake.

If Regexp#to_s returns only the source (without parameters), it will work as you expect:

 def foo_regex.to_s source end /A #{foo_regex}/i #=> /A foo/i 

The above code is for demo purposes only, do not do this.

+1
source

Source: https://habr.com/ru/post/1265981/


All Articles