Is .parse anchor or: sigspace first in a Perl 6 rule?

I have two questions. Is the behavior that I'm showing right, and if so, is it documented somewhere?

I played with the TOP grammar method. Declared as rule , it implies bindings of the beginning and end of the string along with :sigspace :

 grammar Number { rule TOP { \d+ } } my @strings = '137', '137 ', ' 137 '; for @strings -> $string { my $result = Number.parse( $string ); given $result { when Match { put "<$string> worked!" } when Any { put "<$string> failed!" } } } 

Without whitespace or trailing spaces, the string parses. With leading spaces, it fails:

 <137> worked! <137 > worked! < 137 > failed! 

I suppose this means that rule first applies :sigspace , and then binds:

 grammar Foo { regex TOP { ^ :sigspace \d+ $ } } 

I was expecting the rule allow spaces, which will happen if you reorder:

 grammar Foo { regex TOP { :sigspace ^ \d+ $ } } 

I could add an explicit token to the rule to start the line:

 grammar Number { rule TOP { ^ \d+ } } 

Now everything works:

 <137> worked! <137 > worked! < 137 > worked! 

I have no reason to think that it should be anyway. Grammars docs say two things happen, but the docs don't say what order these effects apply:

Please note: if you deal with the .parse method, the TOP token is automatically bound

and

When a rule is used instead of a token, any spaces after the atom turn into an unclaimed ws call.


I think the answer is that the rule is not actually fixed in the sense of the template. This works .parse . The cursor should start at position 0 and end at the last position of the line. Something outside the template.

+2
source share
2 answers

There are no two regular expression effects. rule applies :sigspace . After that, the grammar is determined. When you call .parse , it starts at the beginning of the line and ends (or crashes). This pinning is not part of the grammar. This is part of how .parse applies grammar.

My main problem was that some things are stated in the documents. They are not technically wrong, but they also tend to perceive knowledge about things that the reader may not know. In this case, the random comment on the TOP binding is not as special as it seems. Any rule passed to .parse is bound in the same way. There is no special behavior for this rule name, except for the default value for :rule when calling .parse .

+1
source

The behavior is intended and is the culmination of these language features:

  • Sigspace ignores spaces before the first atom.

    From design documents 1 ( S05: Regular rules and regulations, line 348 , emphasis added):

    The new: s modifier (: sigspace) forces certain sequences of spaces to be considered "significant"; they are replaced by a space matching rule. Only whitespace is available, immediately following the corresponding construct (atom, quantified atom, or statement). Initial spaces are ignored at the beginning of any regular expression to make it easier to write down rules that can participate in alternations with the longest token. Trailing the space inside regex separators is significant.

    It means:

      rule TOP {\ d +}
                   ^ -------- <.ws> automatically inserted
    
     rule TOP {^ \ d + $}
                 ^ --- ^ - ^ ---- <.ws> automatically inserted
    
  • Regexes is the first class of compiled code with lexical reach.

    A regular expression / rule is not a string that can contain characters concatenated with it later to change its behavior. This is an autonomous procedure that is analyzed and its behavior is beaten at compile time.

    Regular expression modifiers, such as :sigspace , including those that are implicitly added by the rule keyword, apply only to their lexical area - that is, to the fragment of the source code that they appear at compile time. S05, line 629 1 :

    Modifiers: i ,: m ,: r ,: s ,: dba ,: Perl5 and Unicode can be placed inside the regular expression (and are lexically limited)
  • The rule TOP .parse is performed at runtime .parse .

    S05, line 4423 1 :

    The .parse and .parsefile methods snap to the beginning and end of the text and fail if the end of the text is not reached. (The TOP rule can check for itself if it wants to create its own error message.)

    those. binding to the beginning of a line is not an integral part of the TOP rule and does not affect how the TOP lexical region is analyzed and compiled. This is done by calling the .parse method.

    This should be so, because since the same grammar can be used with different initial rules instead of TOP , using .parse(..., rule => ...) .

So when you write

 rule TOP { \d+ } 

it compiled as

 regex TOP { :r \d+ <.ws> } 

And when you .parse this grammar, it effectively calls the regular expression code ^ <TOP> $ , and the bindings are not part of the TOP lexical scope, but rather a scope that just calls the TOP procedure. The combined behavior looks like the TOP rule was written as:

 regex TOP { ^ [:r :s \d+] $ } 

1) The design documentation should not be taken at all as the gospel for what is or is not part of the Perl 6 language, but S05 is pretty accurate in this regard, except that it mentions some features that are not yet implemented, but are planned. Anyone who wants to really understand the intricacies of Perge 6 regexes / grammars is an IMO, well-served, reading the full S05 from top to bottom at least once.

+1
source

Source: https://habr.com/ru/post/1275472/


All Articles