Interest Ask. Section 3.3 of the JSL says :
UnicodeEscape: \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit UnicodeMarker: u UnicodeMarker u
which translates to \\u+\p{XDigit}{4}
and
If a suitable \ is followed by u or more than one u, and the last u is not followed by four hexadecimal digits, then a compile-time error occurs.
So, you are right, after a backslash there can be one or more u . The reason is given below:
The Java programming language defines a standard way to convert a program written in Unicode to ASCII, which changes the program into a form that can be processed with ASCII-based tools. Conversion involves converting all Unicode screens to ASCII source code by adding extra u - for example, \ uxxxx becomes \ uuxxxx - while converting non-ASCII characters in the source text to Unicode escape sequences containing one u each.
This converted version is equally acceptable for the Java compiler and is the same program. The exact Unicode source can be later retrieved from this ASCII form by converting each escape sequence where several u are present in a Unicode character sequence with one less than u, while simultaneously converting each escape sequence with one u to the corresponding one Unicode character.
So this input
\u0020รค
becomes
\uu0020\u00e4
The first uu means here: "it was a Unicode escape code sequence to start with," and the second u says: "An automatic tool will convert a non-ASCII character to a Unicode escape code."
This information is useful when you want to convert back from ASCII to unicode: you can recover as much of the source code as possible.
Aaron Digulla Feb 03 '14 at 8:54 2014-02-03 08:54
source share