Unintentional Concatenation in a Bison / Yacc Grammar

Question

Unintentional Concatenation in a Bison / Yacc Grammar

I am experimenting with lex and yacc and ran into some kind of weird problem, but I think it is best to show you my code before talking about this problem in detail. This is my lexer:

%{
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
%}

%%

[a-zA-Z]+ {
  yylval.strV = yytext;
  return ID;
}

[0-9]+      {
  yylval.intV = atoi(yytext);
  return INTEGER;
}

[\n] { return *yytext; }

[ \t]        ;

. yyerror("invalid character");

%%

int yywrap(void) {
  return 1;
}

This is my parser:

%{
#include <stdio.h>

int yydebug=1;
void prompt();
void yyerror(char *);
int yylex(void);
%}

%union {
  int intV;
  char *strV;
}

%token INTEGER ID

%%

program: program statement EOF { prompt(); }
       | program EOF { prompt(); }
       | { prompt(); }
       ;

args: /* empty */
    | args ID { printf(":%s ", $<strV>2); }
    ;

statement: ID args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

EOF: '\n'

%%

void yyerror(char *s) {
  fprintf(stderr, "%s\n", s);
}

void prompt() {
  printf("> ");
}

int main(void) {
  yyparse();
  return 0;
}

A very simple language consisting of nothing more than strings and an integer and a basic REPL. Now you will notice in the parser that args are output with a leading colon, and the intention is that, in combination with the first statement rule template, the interaction with REPL will look something like this:

> aaa aa a
:aa :a aaa>

However, the interaction is as follows:

> aaa aa a
:aa :a aaa aa aa
>

Why is the token identifier in the next rule

statement: ID args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

have the semantic meaning of a common input line, including a new line? How can my grammar be redesigned so that the interaction I intended?

+3

c lex yacc grammar

troutwine 23 . '10 5:48

2

, args statement. () bison -v parser.output:

Nonterminals, with rules where they appear

$accept (6)
    on left: 0
program (7)
    on left: 1 2 3, on right: 0 1 2
statement (8)
    on left: 4 5, on right: 1
args (9)
    on left: 6 7, on right: 4 7
EOF (10)
    on left: 8, on right: 1 2

, , . , , EOF- EOL; .

.

0

msw 23 . '10 6:18

Jonathan Leffler · Accepted Answer · 2010-04-26T13:52:40+0000

, , , . statement :

statement: ID { printf("<%s> ", $<strV>1); } args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

, , :

> aaa aa a
<aaa> :aa :a aaa aa a
>

, - , . , , , args .

Unintentional Concatenation in a Bison / Yacc Grammar

More articles: