EBNF / parboiled: how to convert regexp to PEG?

Question

EBNF / parboiled: how to convert regexp to PEG?

This is a specific issue for the parasitic structure of the parser, and for BNF / PEG in general.

Say I have a pretty simple regex

^\\s*([A-Za-z_][A-Za-z_0-9]*)\\s*=\\s*(\\S+)\\s*$

which is pseudo-EBNF

<line>               ::= <ws>? <identifier> <ws>? '=' <nonwhitespace> <ws>?
<ws>                 ::= (' ' | '\t' | {other whitespace characters})+
<identifier>         ::= <identifier-head> <identifier-tail>
<identifier-head>    ::= <letter> | '_'    
<identifier-tail>    ::= (<letter> | <digit> | '_')*
<letter>             ::= ('A'..'Z') | ('a'..'z')
<digit>              ::= '0'..'9'
<nonwhitespace>      ::= ___________

How would you define nonwhitespace (one or more characters that are not spaces) in EBNF?

For those familiar with the Java parboiled library, how can you implement a rule that defines non-white spaces?

+3

java parsing ebnf parboiled

Jason s Mar 03 '11 at 18:06

source share

2 answers

Ira Baxter · Answer 1 · 2011-03-03T19:14:23+0000

You are attached to the conventions of your lexical generator for specifying ranges of characters and operations in ranges of characters.

Many lexer generators accept hexadecimal values (something like 0x) to represent characters, so you can write:

 '0'..'9'
 0x30..\0x39

for numbers.

, . 7- ASCII nonwhitespace :

0x21..\0x7E

ISO8859-1:

( 0x21..\0x7E | 0x80-0xFF )

, 0x80 ( ?). 0x0..0x1F. tab (0x9) ? CR 0xD LF 0xA? ETB?

Unicode , , . . DMS Software Reengineering Toolkit ASCII, ISO8859-z z Unicode. , "" , DMS , :

 <UniCodeLegalCharacters>-<UniCodeWhiteSpace>

.

ChrisBlom · Answer 2 · 2013-10-04T09:51:58+0000

EBNF nonwhitespace , :

nonwhitespace ::= anycharacter - whitespace

, anycharacter, , , .

Parboiled , TestNot ANY, , nonwhitespace , WhiteSpace():

Sequence( TestNot(WhiteSpace()) , ANY )

EBNF / parboiled: how to convert regexp to PEG?

More articles: