I think your problem is that your regular expressions for t_TABLE and t_COLUMN also match your reserved words (SELECT and FROM). In other words, SELECT a FROM b; points to something like COLUMN COLUMN COLUMN COLUMN END (or some other ambiguous tokenization), and this does not correspond to any of your productions, so you get a syntax error.
As a quick health check, modify these regular expressions so that they exactly match what you type as follows:
t_TABLE = r'b' t_COLUMN = r'a'
You will see that the syntax is SELECT a FROM b; passes because the regular expressions 'a' and 'b' do not match your reserved words.
And another problem is that the regular expressions for TABLE and COLUMN also overlap, so lexer cannot marx without ambiguity regarding these tokens.
Here's a thin but relevant section of the PLY documentation . I'm not sure the best way to explain this, but the trick is that the tokenization goes through first, so it cannot really use the context from your production rules to find out if it ran into the TABLE token or the COLUMN token. You need to generalize them to some ID token, and then cut the contents during the session.
If I had a bit of energy, I would try to work more efficiently with your code and provide the actual solution in the code, but I think, since you already said that this is an exercise that, perhaps, you will be happy, I am pointing in the right direction .
source share