Releasing a string allocated in strdup () from flex / bison

Question

Releasing a string allocated in strdup () from flex / bison

I have a flexible code that copies a lexeme string using strdup() .

 %{ #include "json.tab.h" #define YY_DECL extern "C" int yylex() %} %option noyywrap %% [ \t\n]+ ; \"[a-zA-Z]+\" {yylval.sval = strdup(yytext); return STRING; } [0-9]+ {yylval.ival = atoi(yytext); return NUMBER; } . {return yytext[0];} ; %%

strdup() allocates memory and copies the input string into it and returns ( strdup () - what does it do in C? ), so I think I need to free it when I no longer need it.

From this post: When is% destructor called in BISON? , I added %destructor { free($$); printf("free");} STRING %destructor { free($$); printf("free");} STRING to the yacc file.

However, I do not see that free() is called even when yylval.sval is assigned with a new line returned from strdup() .

What could be wrong? How to free selected line in flex / bison?

Added

I am thinking about using a statically assigned sval as follows:

 %union { int ival; char sval[100]; // char* sval; }

Now a flexible code appears (without a verification code if yytext is less than 100 bytes):

 \"[a-zA-Z]+\" { //yylval.sval = strdup(yytext); memset(yylval.sval, 0, 100); strcpy(yylval.sval, yytext); return STRING; }

I am not sure that this approach is used by people.

Added2

For my application, simple interning is fine.

 extern char buffer[]; // [100]; %} %option noyywrap %% \"[a-zA-Z]+\" { //yylval.sval = strdup(yytext); memset(buffer, 0, 100); strcpy(buffer, yytext); yylval.sval = buffer; return STRING; } ... char buffer[100];

For yacc code

 %union { int ival; char *sval; }

+6

c flex-lexer bison yacc strdup

prosseek Jun 28 '15 at 20:14

source share

1 answer

rici · Accepted Answer · 2015-06-28T22:28:51+0000

As you say, you need to free the line "when I no longer need it." It is simple (or difficult).

C does not have a garbage collector, so C programmers are responsible for knowing when the allocated memory is no longer needed. The language does not try to understand this, and (mostly) also not buffalo.

If you have a reduction rule that contains one or more semantic values that contain pointers to allocated memory, this rule can perform any of several actions. It can pass semantic values to a new semantic meaning, usually by copying only the pointer. It can copy the semantic meaning and then release the original. It can add semantic meaning to a syntax diagram, such as a symbol table.

In all these cases, the programmer must know if dedicated memory is required, and whether he should free the allocation if it is not.

However, there are several cases in which a bison discards semantic meaning without resorting to the ever-present action of contraction. Most of them are error conditions. If, as part of error recovery, the bison decides to refuse the token, this semantic value of the marker may lead to a memory leak. And just for this case, the bison has a %destructor declaration. The %destructor code is called if (and only if) bison drops the token as a result of error recovery or cleaning after an error. All other cases are your responsibility.

An attempt to evade this responsibility due to the fact that the stack slots are huge (for example, including char[100] in the semantic combination of values) are unsafe and inefficient. This is unsafe because you need to constantly know that a fixed-space buffer can overflow, which means that a syntactically correct program can overwrite arbitrary memory. This is inefficient because you are making the stack several orders of magnitude larger than necessary; and also because you constantly copy stack slots (at least twice for each reduction rule, even those that use the default action.)

Determining the lifetime of semantic meaning is only complicated if you are going to share memory. This is usually not useful for string literals (as in your example), but can be quite useful for variable names; most names are found more than once in the program, so it is always tempting to use the same string of characters for each event.

I usually solve the problem with the identifier by "interning" the string in the lexer. The lexer maintains a global analysis parser table — say, a simple set implemented with a hash table — and for each identifier it encounters, it adds the identifier to the name table and passes a pointer to enter a unique name as a semantic value. At some point after the end of the syntax, the entire name table can be freed, freeing all identifiers.

For string literals and other probably unique strings, you can either use the name table anyway, or avoid having two copies of a pointer to the same character string. Using a name table has the advantage that you need to reduce the amount of work you need to manage memory, but at the expense of possibly saving extra lines for extra time. It depends a lot on the nature of the parsing result: if it is an AST, then you probably need to save character strings if there is an AST, but if you are doing direct execution or generating one-pass encoding, you may not need string literals in the long run.

Releasing a string allocated in strdup () from flex / bison

Added

Added2

More articles: