Grammar is used to describe all possible lines in a language. It is also useful in determining how the parser should parse the language.
In this grammar, they seem to use their own version of EBNF , where non-terminal is any lowercase word, and the terminal is all uppercase or surrounded by quotation marks. For example, NEWLINE is a terminal, arith_expr is non-terminal, and "if" is also a terminal. Any non-terminal can be replaced with anything to the right of the colon of its corresponding production rule. For example, if you look at the first rule:
single_input: NEWLINE | simple_stmt | composite_stmt NEWLINE
We can replace single_input with one of NEWLINE, simple_stmt or complex_stmt, followed by NEWLINE. Suppose we replaced it with "compound_stmt NEWLINE", then we will look for a production rule for complex_stmt:
compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
and select which one we want to use and replace it with "composite_stmt" (keeping NEWLINE in it)
Suppose we wanted to generate a valid python program:
if 5 < 2 + 3 or not 1 == 5: raise
We could use the following output:
- single_input
- compound_stmt NEWLINE
- if_stmt NEWLINE
- 'if' test ':' suite NEWLINE
- 'if' or_test ':' NEWLINE INDENT stmt stmt DEDENT NEWLINE
- 'if' and_test 'or' and_test ':' NEWLINE INDENT simple_stmt DEDENT NEWLINE
- 'if' not_test 'or' not_test ':' NEWLINE INDENT small_stmt DEDENT NEWLINE
- 'if' comparison 'or' 'not' not_test ':' NEWLINE INDENT flow_stmt DEDENT NEWLINE
- 'if' expr comp_op expr 'or' 'not' comparison ':' NEWLINE INDENT raise_stmt DEDENT NEWLINE
- 'if' arith_expr '<' arith_expr 'or' 'not' arith_expr comp_op arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
- 'if' term '<' term '+' term 'or' 'not' arith_expr == arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
- 'if' NUMBER '<' NUMBER '+' NUMBER 'or' 'not' NUMBER == NUMBER ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
A few notes here, firstly, we must start with one of the non-terminals, which is indicated as the starting non-terminal. On this page they list them as single_input, file_input or eval_input. Secondly, the conclusion is completed after all the characters are terminal (hence the name). Thirdly, most often one substitution is done per line, for the sake of brevity, I did all possible replacements at once and began to skip steps towards the end.
Given the string in the language, how to find its output? This is the work of the parser. The parser draws the engineers a production sequence to first verify that it is indeed a valid string, and moreover, how it can be derived from grammar. It is worth noting that many grammars can describe one language. However, for a given line, its output, of course, will be different for each grammar. Therefore, technically we are writing a parser for grammar, not language. Some grammars are easier to parse, some grammars are easier to read / understand. It belongs to the first.
Nor does it indicate the whole language of what it looks like. Grammar says nothing about semantics.
If you're more interested in parsing and grammar, I recommend Grune, Jacobs - Parsing Techniques . It is free and good for self-study.