Regex pattern distinguishes letters if it is not?

I am writing a regex to easily verify the username for practice. Although I'm sure there may be other problems with this pattern, I would like someone to explain this seemingly strange behavior that I get.

import java.io.*; import java.util.*; import java.text.*; import java.math.*; import java.util.regex.*; public class userRegex{ public static void main(String[] args){ Scanner in = new Scanner(System.in); int testCases = Integer.parseInt(in.nextLine()); while(testCases>0){ String username = in.nextLine(); String pattern = "([[:alpha:]])[a-zA-Z_]{7,29}"; Pattern r = Pattern.compile(pattern); Matcher m = r.matcher(username); if (m.find( )) { System.out.println("Valid"); } else { System.out.println("Invalid"); } testCases--; } } } 

When I enter:

 2 dfhidbuffon dfdidbuffon 

the compiler should return:

 Valid Valid 

but instead it returns

 Valid Invalid 

Why does he distinguish the difference between the 3rd letter: "h" or "d" in each of the usernames?

Edit: Added @ Draco18s and @ruakh sentences, however, I still get the same weird behavior.

+5
source share
3 answers

[:alpha:] doesn't really matter what you mean; rather, it simply means "any of the characters : a , h , l , p ". So dfhidbuffon contains a match for your template (namely h plus idbuffon ), while dfdidbuffon does not. (Note that matcher.find() looks for any match in the string, if you want to match the entire string exactly, you must use matcher.matches() , or you can change your template to use anchors such as ^ and $ .)

You can think of the notation found in many regular expression implementations, where [:alpha:] means "any alphabetic character"; but, firstly, the Java Pattern class does not support this notation (hint for ajb ), and secondly, for these languages ​​it will take [:alpha:] appear inside the character class, for example, like [[:alpha:]] . The Java equivalent would be \p{Alpha} or [A-Za-z] if you only want to combine the letters ASCII and \p{IsAlphabetic} if you want to match any Unicode letter.

+7
source

: Alpha: abbreviated for the alphabetic character Posix characters.

According to the Java 7 "Pattern" docs , Posix character classes are supported using the \p{Alpha} format, and not: alpha: format - the last format is not specified anywhere in the link.

It works as expected for me with template definition using a supported format to define a Posix character class as follows:

 String pattern = "(\\p{Alpha})[a-zA-Z_]{7,29}"; 
+1
source

According to Regexpal.com, "([: alpha:])" matches "any of the characters ':', 'a', 'h', 'l', 'p'". "dfdidbuffon" contains any of these characters, so it fails (the [Az] part is never reached).

You probably intended "[a-zA-Z](\\w){7,28}" If you use the direct notation regex, /[a-zA-Z](\w){7,28}/

This will match any alpha character, then from 7 to 28 words (alphanumeric + underline)

If you do not want a number, then "[a-zA-Z]([a-zA-Z_]){7,28}"

0
source

Source: https://habr.com/ru/post/1260056/


All Articles