Recognizing Extended Characters with JAVACC

Question

Recognizing Extended Characters with JAVACC

I am creating a grammar using JavaCC and am facing a small problem. I am trying to resolve any valid character in the extended ASCII set that should be recognized by the resulting compiler. After looking at the same JavaCC examples (primarily using the JavaCC Grammer as an example), I set the following token to recognize my characters:

< CHARACTER:

  (   (~["'"," ","\\","\n","\r"])
    | ("\\"
        ( ["n","t","b","r","f","\\","'","\""]
        | ["0"-"7"] ( ["0"-"7"] )?
        | ["0"-"3"] ["0"-"7"] ["0"-"7"]
        )
      )
  )

>

If I understand correctly, it should correspond to the octal representation of all ASCII characters, starting from 0-377 (which covers all 256 characters in the extended ASCII set). This is done as expected for all keyboard characters (az, 0-9,?,. / Etc) and even for most special characters (©, ¬ ®). However, whenever I try to parse a trademark (™) symbol, my parser constantly throws an “End of file” exception, indicating that it cannot recognize the symbol. Is there any obvious way that I can improve my definition of a symbol in order to accept a trademark symbol?

+3

ascii extended-ascii javacc

RGordon1982 Apr 20 '09 at 13:13

source share

2 answers

RGordon1982 · Answer 1 · 2009-04-20T17:06:19+0000

, , , , , Unicode, ASCII, ™ Unicode, ASCII. , , : ( unicode U + 00FF)

< CHARACTER:(   (~["'"," ","\\","\n","\r"])
| ("\\"
    ( ["n","t","b","r","f","\\","'","\""]
    | ["u","U"]["+"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]
    )
  ) )>

fernacolo · Answer 2 · 2011-12-07T14:10:19+0000

( CP1252, ISO-8859-1), String . UNICODE_INPUT :

options {
  UNICODE_INPUT=true;
}

.

JavaCC: http://javacc.java.net/doc/javaccgrm.html

Recognizing Extended Characters with JAVACC

More articles: