This is really not a way to solve this particular problem.
The usual way to do this is to write separate template rules for recognizing keywords and variable names. (Plus a template rule to ignore spaces.) This means that the tokenizer will return two tokens for entering int var3 . Recognizing that two tokens are a valid announcement is the responsibility of the parser, which will repeatedly call the tokenizer to analyze the token stream.
However, if you really want to recognize two words as one token, this is certainly possible. (F) lex does not allow negative images in regular expressions, but you can use the pattern matching rule to capture erroneous tokens.
For example, you can do something like this:
dataType int|float|char|String id [[:alpha:]_][[:alnum:]_]* %% {dataType}[[:white:]]+{dataType} { puts("Error: two types"); } {dataType}[[:white:]]+{id} { puts("Valid declaration"); } /* ... more rules ... */
The above uses Posix character classes instead of writing down possible characters. See man isalpha for a list of Posix character classes; the component of the character class [:xxxxx:] contains exactly the characters accepted by the standard library function isxxxxx . I fixed the template so that it allowed more than one space to be used between dataType and id and simplified the template for id s.
source share