Recognizing Extended Characters with JAVACC

I am creating a grammar using JavaCC and am facing a small problem. I am trying to resolve any valid character in the extended ASCII set that should be recognized by the resulting compiler. After looking at the same JavaCC examples (primarily using the JavaCC Grammer as an example), I set the following token to recognize my characters:

< CHARACTER:

  (   (~["'"," ","\\","\n","\r"])
    | ("\\"
        ( ["n","t","b","r","f","\\","'","\""]
        | ["0"-"7"] ( ["0"-"7"] )?
        | ["0"-"3"] ["0"-"7"] ["0"-"7"]
        )
      )
  )

>

If I understand correctly, it should correspond to the octal representation of all ASCII characters, starting from 0-377 (which covers all 256 characters in the extended ASCII set). This is done as expected for all keyboard characters (az, 0-9,?,. / Etc) and even for most special characters (©, ¬ ®). However, whenever I try to parse a trademark (™) symbol, my parser constantly throws an “End of file” exception, indicating that it cannot recognize the symbol. Is there any obvious way that I can improve my definition of a symbol in order to accept a trademark symbol?

+3
source share
2 answers

, , , , , Unicode, ASCII, ™ Unicode, ASCII. , , : ( unicode U + 00FF)

< CHARACTER:(   (~["'"," ","\\","\n","\r"])
| ("\\"
    ( ["n","t","b","r","f","\\","'","\""]
    | ["u","U"]["+"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]
    )
  ) )>
+1

( CP1252, ISO-8859-1), String . UNICODE_INPUT :

options {
  UNICODE_INPUT=true;
}

.

JavaCC: http://javacc.java.net/doc/javaccgrm.html

0

Source: https://habr.com/ru/post/1706725/


All Articles