When is space really important in Perl6 grammar?

can someone clarify when a space matters in rules in Perl 6 grammars? I study some of them by trial and error, but I can not find the actual rules in the documentation.

Example 1:

rule number { <pm> \d '.'? \d*[ <pm> \d* ]? } rule pm { [ '+' || '-' ]? } 

Will correspond to the number 2.68156e+154 , and do not care about the spaces that are present in the rule number . However, if I add a space after \d* , it will fail. (i.e. <pm> \d '.'? \d* [ <pm> \d* ]? does not work).

Example 2: If I try to find literals in the middle of a word, it is important that they are around each other. I., in search of the Double_t Delta_phi_R_1_9_pTproj_13_dat_cent_fx3001[52] = {

 grammar TOP { ^ .*? <word-to-find> .* ? } rule word-to-find { \w*?fx\w* } 

Find the word. However, if the definition of the word-to-find rule is changed to: fx or \w* fx\w* or \w*fx \w* , then it will not match.

In addition, then the definition of '[52]' will match, but the definition of 'fx[52]' will not.

Thank you for understanding. A pointer to the correct point in the documentation would help a lot! Thanks,

+5
source share
2 answers

can someone clarify when a space matters in the rules in Perl 6 grammars?

When active :sigspace .

Below I will tell you a little more. If you or anyone else is reading this information, please let me know using the comments and I will expand further.

First, let me eliminate one possible source of confusion, namely the meaning of the word rule and regular expression in the context of Perl 6, before I post the doc link.

A word rule can be used as a general meaning ("regular expression, string matching, and Perl 6 general analysis tool") or as a keyword ( rule ). Similarly, a regular expression can be used to mean the same thing as a general rule or keyword ( regex ).

With this preamble aside, here is a link to a section :sigspace doc .

Note that the rule keyword implicitly inserts a :sigspace , so that it takes effect immediately after the first atom in the declared rule and that the effect is lexical. See @smls answer to another SO question , especially the first two bullet points, for a detailed discussion of these two important details.

You can also find my answer to another SO question on space / tokenization .

Hth.

+5
source

In rule space turns into <.ws> (i.e., not an exciting call to the ws token), except:

  • At the beginning of the rule before the first atom
  • At the beginning of [ (group) or ( (positional capture)
  • After || , | and &
  • After declaring a variable ( :my $x = 'foo'; )
  • After code code
  • After the % operator to enter a separator
  • After ~ target matching operator
  • After the internal modifier (for example :i )
  • Inside a construct like $<var> = x

Or perhaps itโ€™s easier to remember, it will be inserted after any construction that can match some characters and after any zero-width statement.

An important design goal in these rules is to never embed <.ws> somewhere that interferes with the longest token matching. For example, consider rule foo:sym<ba> { [ bar | baz ] } rule foo:sym<ba> { [ bar | baz ] } , which is equivalent token foo:sym<ba> { [ bar <.ws> | baz <.ws> ] <.ws> } token foo:sym<ba> { [ bar <.ws> | baz <.ws> ] <.ws> } . The default implementation of ws not declarative (thanks to the use of <!ww> ), which means that it will break the longest token, corresponding both at the protorex level and at the beginning of the rule, or at the rotation level it is inserted at the beginning of the group or after | .

Note that these rules only apply to rule , not token and regex . They can be turned on at any time with :s and turned off with:! :!s in any of them, however ( rule actually just means โ€œpretend to be there :s at the beginningโ€).

Finally, the ws rule (by default, token ws { <!ww> \s* } ) can be redefined in the grammar to determine what a space means in the parsed language.

+10
source

Source: https://habr.com/ru/post/1275471/


All Articles