I have 7 comments about your grammar by looking at it:
one
k=? denotes the appearance of the parser rules, and since your is a lexical grammar, delete it;
2
Although not so, BOOLEAN_KEYWORD : 'b' 'o' 'o' 'l' 'e' 'a' 'n'; quite verbose. Instead, do BOOLEAN_KEYWORD : 'boolean'; .
3
The protected keyword has changed to fragment in ANTLR 3. But you do strange things. Follow these rules:
fragment INT : ('0'..'9')+ ; fragment FLOAT : INT '.' INT ; FLOAT_OR_INT : ( INT '.' ) => FLOAT { $setType(FLOAT); } | INT { $setType(INT); } ;
You create two fragments, and then FLOAT_OR_INT check the predicate if it "sees" INT , and then '.' and then changes it to FLOAT . The following does the same thing and is much more readable / better / preferable:
FLOAT : DIGIT+ '.' DIGIT+ ; INT : DIGIT+ ; fragment DIGIT : '0'..'9' ;
4
.* does not matter by default, so change:
'#' ( options {greedy=false;} : . )* '#'
in
'#' .* '#'
or even better:
'#' ~'#'+ '#'
5
Rule:
OPENPAR_OR_OUTPUT_OPERATOR : '(' { $setType(OPEN_PAR); } | '(' '(' { $setType(OUTPUT_OPERATOR); } ;
should be simple:
OUTPUT_OPERATOR : '((' ; OPEN_PAR : '(' ;
6
ANTLR lexer tries to match as many characters as possible. Whenever two rules correspond to the same number of characters, the rule defined by firs will βwinβ. That is why you must define all your *_KEYWORD rules before the rule ID .
7
Finally, you donβt need to check if "in" or "out" should be checked using ID (and then change the type of token). Whenever a lexer "sees" an input as "inside" , it always creates one ID token, not INPUT_KEYWORD , followed by an ID , since the lexer matches as much as possible (see Note # 6).
It seems you are trying to learn ANTLR by trial and error, or using outdated documentation. This is not a way to learn ANTLR. Try to get the Parr The Definitive ANTLR Reference to find out correctly.
Good luck
EDIT
Well, if you can't get it to work, here is the working version of your grammar:
lexer grammar OurCompiler; // A bit of an odd name for a lexer... K_INT : 'int'; K_FLOAT : 'float'; K_CHAR : 'char'; K_STRING : 'string'; K_BOOLEAN : 'boolean'; K_INPUT : 'in'; K_OUTPUT : 'out'; K_IF : 'if'; K_FOR : 'for'; K_SWITCH : 'switch'; K_CASE : 'case'; K_BREAK : 'break'; K_DEFAULT : 'default'; K_WHILE : 'while'; K_ELSE : 'else'; K_ELSEIF : 'elseif'; K_AND : 'and'; K_OR : 'or'; K_NOT : 'not'; K_CONSTANT : 'constant'; BOOLEAN : 'true' | 'false'; FLOAT : DIGIT+ '.' DIGIT+; INT : DIGIT+; STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"'; CHAR : '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''; ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*; INPUT_OPERATOR : '))'; OUTPUT_OPERATOR : '(('; OPEN_PAR : '('; CLOSE_PAR : ')'; LOWER : '<'; LOWER_EQUAL : '<='; UPPER : '>'; UPPER_EQUAL : '>='; ASSIGN : '='; EQUAL : '=='; NOT : '!'; NOT_EQUAL : '!='; ADD : '+'; ADD_TO_PREVIOUS : '+='; INCREMENT : '++'; MINUS : '-'; MINUS_FROM_PREVIOUS : '-='; DECREMENT : '--'; MULTIPLY : '*'; MULTIPLY_TO_PREVIOUS : '*='; DIVIDE : '/'; DIVIDE_FROM_PREVIOUS : '/='; MODE : '%'; OPEN_BRAKET : '['; CLOSE_BRAKET : ']'; OPEN_BRACE : '{'; CLOSE_BRACE : '}'; COLON : ':'; SEMICOLON : ';'; COMMA : ','; SINGLE_LINE_COMMENT : '##' ~('\r' | '\n')* {skip();}; MULTIPLE_LINE_COMMENT : '#' ~'#'+ '#' {skip();}; WS : ( ' ' | '\t' | '\r' | '\n') {skip();}; fragment ESC_SEQ : '\\' ('b' | 't' | 'n' | 'f' | 'r' | '\"' | '\'' | '\\'); fragment DIGIT : '0'..'9';