Manual regular scan vs = ~

Question

Manual regular scan vs = ~

The Ruby documentation (1.9.3) seems to imply that scanning is equivalent to = ~, except that

scan returns several matches, while = ~ returns only the first occurrence and
scan returns matching data, and = ~ returns an index.

However, in the following example, the two methods seem to return different results for the same string and expression. Why is this?

1.9.3p0 :002 > str = "Perl and Python - the two languages" => "Perl and Python - the two languages" 1.9.3p0 :008 > exp = /P(erl|ython)/ => /P(erl|ython)/ 1.9.3p0 :009 > str =~ exp => 0 1.9.3p0 :010 > str.scan exp => [["erl"], ["ython"]]

If the index of the first match is 0, you should not check the return of "Perl" and "Python" instead of "erl" and "python"?

thanks

+6

ruby regex

Anand Apr 24 '12 at 3:37

source share

1 answer

sepp2k · Accepted Answer · 2012-04-24T03:42:04+0000

When defining a regular expression without capturing groups, scan returns an array of strings, where each line represents a regular expression match. If you use scan(/P(?:erl|ython)/) (which matches your regular expression, with the exception of capture groups), you will get ["Perl", "Python"] what you expect.

However, if a regular expression with capture groups is specified, scan will return an array of arrays, where each sub-array contains captures of a given match. Therefore, if you have, for example, regex (\w*):(\w*) , you will get an array of arrays in which each sub-array contains two lines: the part before the colon and the part after the colon. And in your example, each sub-array contains one line: the part matched (erl|ython) .

Manual regular scan vs = ~

More articles: