In Context Free Grammar, how to define a pair of tags like <abc> data </ ​​abc>?

Expected Language:

<hat>Nike</hat><car>Toyota</car>... 

The difficulty I ran into is how to specify that in the pair of tags the start tag and the end tag have the same name.

tag is a combination of [a-zA-Z] with a length of less than 10.

 <tag>data</tag> 
+4
source share
2 answers

TL; DR; Neither BNF nor EBNF can express this CFG in a reasonable way.

Consider using EBNF and explicitly - through EBNF comments or out of context - either:

  • Set a restrictive rule when creating a pair of or;
  • Imagine a way to represent a finite set of "template" non-terminals.

Depending on use, a puff may be required that can alter these CFG changes / limitations.


(This old prelude is for non-CFG, as the question was first written.)

As far as I know, <x>..</x> for arbitrary and unknown x not a CFL, because Context Free Grammar is a limited finite set of both terminals and non-terminals. However, by definition above, x not guaranteed in this set.

However, given the small discretion, informal restrictions can be added to denote EBNF . Of course, they are out of EBNF syntax.

 Pair = "<" Tag^1 ">" Content "</" Tag^2 ">" (* Where Tag^1 equals Tag^2 *) Tag = .. (* If a finite set, this could still be converted to formal EBNF by rewriting the above Pair as all possible alterations as shown in the next section. Only small values of "finite" are reasonable to express. *) Content = .. 

Specifications, such as ECMAScript , include several such limitations that may lie outside of CFG and, therefore, outside EBNF.

However, if this language is CFL, then it can be represented by CFG, for example:

 Pair = HatTagPair | CarTagPair | .. (* All possible non-terminal Pairs *) HatTagPair = "<hat>" Content "</hat>" CarTagPair = .. (* And so on. While it technically possible to have non-terminals A, B .. AA, AB .. and so on, this quickly becomes very impractical in EBNF. *) 

Neither BNF nor EBNF have a “shorthand” way to formally represent such a repetition, and I would say that “tag is a combination of [a-zA-Z] with a length of less than 10,” is not a reasonable finite set of terminals, although it is finite and therefore located in the CFG area ..

There may be other forms of metasyntax CFG that can be used to describe such a language formally, but not in simple BNF / EBNF.

+1
source

Here's the EBNF for XML: http://www.w3.org/TR/REC-xml/#sec-starttags , which defines an element, starting with STags and ending with ETags, quite easily:

 element ::= EmptyElemTag | STag content ETag 

But as for keeping them the same, I think you will need quite a bit of a thinking look defined in the lexing strategy. The related SO post bnf / ebnf for the xml schema suggests that it is better to abandon the CFG goal and use a more basic approach in the code. All that said, I don’t know how many XML parsers really go there.

0
source

Source: https://habr.com/ru/post/1433918/


All Articles