An Interesting Blog The Difference Between Top-Down Parsing and Bottom-Up Parsing
Given a formal grammar and a string created by that grammar, parsing computes the production process for that string.
In the case of context-free grammars, the production process takes the form of a parsing tree. Before we begin, we always know two things about the parsing tree: the root node, which is the source character from which the string was originally derived, and leaf nodes, which are all characters of the string in order. What we do not know is the location of the nodes and branches between them.
For example, if the string is acddf, we already know this:
S
/ | \
???
| | | | | acddf
Grammar example for use in this article.
S β xyz | aBC B β c | cd C β eg | df
Parsing the bottom
This approach is not like solving a puzzle. We start at the bottom of the parse tree with individual characters. Then we use the rules to connect the characters along with the larger tokens as we go. At the end of the line, everything had to be combined into one big S, and S was the only thing left with us. If not, you need to step back and try to combine tokens in different ways.
In parsing from the bottom up, we usually support the stack, which is a list of characters and tokens that we have seen so far. At each step, we move the new character onto the stack, and then reduce it as much as possible by combining the characters into larger tokens. Example
The string is acddf. Steps
Ξ΅ can't be reduced a can't be reduced ac can be reduced, as follows: reduce ac to aB aB can't be reduced aBd can't be reduced aBdd can't be reduced aBddf can be reduced, as follows: reduce aBddf to aBdC aBdC can't be reduced End of string. Stack is aBdC, not S. Failure! Must backtrack. aBddf can't be reduced ac can't be reduced acd can be reduced, as follows: reduce acd to aB aB can't be reduced aBd can't be reduced aBdf can be reduced, as follows: reduce aBdf to aBC aBC can be reduced, as follows: reduce aBC to S End of string. Stack is S. Success!
Parsing trees
| a
| | ac
In | | ac
In | | | acd
In | | | | acdd
In | | | | | acddf
BC | | | | \ acddf
| | ac
| | | acd
B
| / | acd
B
| / | | acdd
B
| / | | | acddf
BC
| / | | \ acddf
S
/ | \ / | | / BC | / | | \ acddf
Example 2
If all combinations fail, the string cannot be parsed.
The string is acdg. Steps
Ξ΅ can't be reduced a can't be reduced ac can be reduced, as follows: reduce ac to aB aB can't be reduced aBd can't be reduced aBdg can't be reduced End of string. Stack is aBdg, not S. Failure! Must backtrack. ac can't be reduced acd can be reduced, as follows: reduce acd to aB aB can't be reduced aBg can't be reduced End of string. stack is aBg, not S. Failure! Must backtrack. acd can't be reduced acdg can't be reduced End of string. Stack is is acdg, not S. No backtracking is possible. Failure!
Parsing trees
| a
| | ac
In | | ac
In | | | acd
In | | | | acdg
| | ac
| | | acd
B
| / | acd
B
| / | | acdg
| | | acd
| | | | acdg
Parsing down
For this approach, we assume that the string matches S and considers the internal logical consequences of this assumption. For example, the fact that the string matches S logically means that either (1) the string matches xyz, or (2) the string matches aBC. If we know that (1) is false, then (2) must be true. But (2) has its further logical consequences. They need to be studied as far as necessary in order to prove the basic statement. Example
The string is acddf. Steps
Assertion 1: acddf matches S Assertion 2: acddf matches xyz: Assertion is false. Try another. Assertion 2: acddf matches aBC ie cddf matches BC: Assertion 3: cddf matches cC ie ddf matches C: Assertion 4: ddf matches eg: False. Assertion 4: ddf matches df: False. Assertion 3 is false. Try another. Assertion 3: cddf matches cdC ie df matches C: Assertion 4: df matches eg: False. Assertion 4: df matches df: Assertion 4 is true. Assertion 3 is true. Assertion 2 is true. Assertion 1 is true. Success!
Parsing trees
S | S
/ | \ a BC | |
S
/ | \ a BC | | from
S
/ | \ a BC / | | cd
S
/ | \ a BC / | | \ cddf
Example 2
If after each inference we cannot prove the main hypothesis ("The string corresponds to S"), then the string cannot be analyzed.
The string is acdg. Steps
Assertion 1: acdg matches S: Assertion 2: acdg matches xyz: False. Assertion 2: acdg matches aBC ie cdg matches BC: Assertion 3: cdg matches cC ie dg matches C: Assertion 4: dg matches eg: False. Assertion 4: dg matches df: False. False. Assertion 3: cdg matches cdC ie g matches C: Assertion 4: g matches eg: False. Assertion 4: g matches df: False. False. False. Assertion 1 is false. Failure!
Parsing trees
S | S
/ | \ a BC | |
S
/ | \ a BC | | from
S
/ | \ a BC / | | cd
Why left recursion is a problem for parsers from top to bottom
If our rules were left recursive, for example, something like this:
S β Sb
Then notice how our algorithm behaves: Steps
Assertion 1: acddf matches S: Assertion 2: acddf matches Sb: Assertion 3: acddf matches Sbb: Assertion 4: acddf matches Sbbb: ...and so on forever
Parsing trees
S |
S | \ S b |
S | \ S b | \ S b |
S | \ S b | \ S b | \ S b |
...