I am referring to the XML 1.1 specification .
Take a look at the definition of NameStartChar :
NameStartChar ::= ":" | [AZ] | "_" | [az] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
If I interpret this correctly, the last range ( #x10000-#xEFFFF ) is outside the range of UTF16 like Java char . So it must be UTF32 , right? So, I need to check char pairs for this range, instead of a single char s, right?
My questions:
- How to check for such character ranges using standard Java methods?
- How can you define such ranges in JavaCC?
- JavaCC complains about
\u10000 and \uEFFFF
Thanks!
NOTE: Donβt worry, I'm not trying to write my own XML parser. EDIT: I am writing a parser that would check if text entered from different text formats (not XML) matches valid XML names.
source share