Difference between \ p {Alpha} and \ p {L} in Java

As I understand it, it \p{L}includes all the letters from Unicode characters, a \p{Alpha}little the same, but only for Latin letters (ASCII). In my work, I have "A" Latin and "A" Cyrillic, and \p{Alpha}in the old java code they do not correspond to Cyrillic letters in the form of letters. When I test it, \p{L}is the solution for me. Can you give me some tips for this situation and what am I using in java code? On this page, use http://www.regular-expressions.info/posixbrackets.html\p{Alpha} for Java code.

+4
source share
1 answer

In fact, it \p{Alpha}is an implementation of the POSIX character class, which will correspond to extended characters only when used in combination with UNICODE_CHARACTER_CLASS (or (?U)), and \p{L}will always correspond to all Unicode letters from the BMP plane. Please note that you can write \p{L}as \pLor \p{IsL}.

More reference data :

Both \p{L}and \p{IsL}indicate the category of Unicode characters .

POSIX character classes (US-ASCII only)
\p{Lower} Lowercase alphabetic character: [a-z]
\p{Upper}Primary alphabetic character: [a-z]
\p{Alpha}Alphabetic character:[\p{Lower}\p{Upper}]

See the following demo :

String l = "Abc";
String c = "";
System.out.println(l.matches("\\p{Alpha}+"));     // => true
System.out.println(c.matches("\\p{Alpha}+"));     // => false
System.out.println(c.matches("(?U)\\p{Alpha}+")); // => true
System.out.println(l.matches("\\p{L}+"));         // => true
System.out.println(c.matches("\\p{L}+"));         // => true
+3

Source: https://habr.com/ru/post/1621859/


All Articles