I am experimenting with lex and yacc and ran into some kind of weird problem, but I think it is best to show you my code before talking about this problem in detail. This is my lexer:
%{
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
%}
%%
[a-zA-Z]+ {
yylval.strV = yytext;
return ID;
}
[0-9]+ {
yylval.intV = atoi(yytext);
return INTEGER;
}
[\n] { return *yytext; }
[ \t] ;
. yyerror("invalid character");
%%
int yywrap(void) {
return 1;
}
This is my parser:
%{
#include <stdio.h>
int yydebug=1;
void prompt();
void yyerror(char *);
int yylex(void);
%}
%union {
int intV;
char *strV;
}
%token INTEGER ID
%%
program: program statement EOF { prompt(); }
| program EOF { prompt(); }
| { prompt(); }
;
args:
| args ID { printf(":%s ", $<strV>2); }
;
statement: ID args { printf("%s", $<strV>1); }
| INTEGER { printf("%d", $<intV>1); }
;
EOF: '\n'
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
void prompt() {
printf("> ");
}
int main(void) {
yyparse();
return 0;
}
A very simple language consisting of nothing more than strings and an integer and a basic REPL. Now you will notice in the parser that args are output with a leading colon, and the intention is that, in combination with the first statement rule template, the interaction with REPL will look something like this:
> aaa aa a
:aa :a aaa>
However, the interaction is as follows:
> aaa aa a
:aa :a aaa aa aa
>
Why is the token identifier in the next rule
statement: ID args { printf("%s", $<strV>1); }
| INTEGER { printf("%d", $<intV>1); }
;
have the semantic meaning of a common input line, including a new line? How can my grammar be redesigned so that the interaction I intended?