EBNF / parboiled: how to convert regexp to PEG?

This is a specific issue for the parasitic structure of the parser, and for BNF / PEG in general.

Say I have a pretty simple regex

^\\s*([A-Za-z_][A-Za-z_0-9]*)\\s*=\\s*(\\S+)\\s*$

which is pseudo-EBNF

<line>               ::= <ws>? <identifier> <ws>? '=' <nonwhitespace> <ws>?
<ws>                 ::= (' ' | '\t' | {other whitespace characters})+
<identifier>         ::= <identifier-head> <identifier-tail>
<identifier-head>    ::= <letter> | '_'    
<identifier-tail>    ::= (<letter> | <digit> | '_')*
<letter>             ::= ('A'..'Z') | ('a'..'z')
<digit>              ::= '0'..'9'
<nonwhitespace>      ::= ___________

How would you define nonwhitespace (one or more characters that are not spaces) in EBNF?

For those familiar with the Java parboiled library, how can you implement a rule that defines non-white spaces?

+3
source share
2 answers

You are attached to the conventions of your lexical generator for specifying ranges of characters and operations in ranges of characters.

Many lexer generators accept hexadecimal values ​​(something like 0x) to represent characters, so you can write:

 '0'..'9'
 0x30..\0x39

for numbers.

, . 7- ASCII nonwhitespace :

0x21..\0x7E

ISO8859-1:

( 0x21..\0x7E | 0x80-0xFF )

, 0x80 ( ?). 0x0..0x1F. tab (0x9) ? CR 0xD LF 0xA? ETB?

Unicode , , . . DMS Software Reengineering Toolkit ASCII, ISO8859-z z Unicode. , "" , DMS , :

 <UniCodeLegalCharacters>-<UniCodeWhiteSpace>

.

+5

EBNF nonwhitespace , :

nonwhitespace ::= anycharacter - whitespace

, anycharacter, , , .

Parboiled , TestNot ANY, , nonwhitespace , WhiteSpace():

Sequence( TestNot(WhiteSpace()) , ANY )
+2

Source: https://habr.com/ru/post/1796110/


All Articles