Regex expression to capture only words without numbers or characters

I need some regex that sets the following line:

"test test3 t3st test: word%5 test! testing t[st" 

will match only words in az characters:

Must match: test testing

Do not match: test3 t3st test: word% 5 test! t [th

I tried ([A-Za-z]) \ w + , but the word% 5 should not match.

+5
source share
2 answers

you can use

 String patt = "(?<!\\S)\\p{Alpha}+(?!\\S)"; 

See the demo of regex .

It will match 1 or more letters that are enclosed in spaces or at the beginning / end of lines. An alternative pattern is (?<!\S)[a-zA-Z]+(?!\S) (the same as above) or (?<!\S)\p{L}+(?!\S) (if you also want to combine all Unicode letters).

More details

  • (?<!\\S) - a negative lookbehind that does not match if there is a non-whitespace char near the current location
  • \\p{Alpha}+ - 1 or more ASCII letters (same as [a-zA-Z]+ , but if you use the flag of the modifier Pattern.UNICODE_CHARACTER_CLASS , \p{Alpha} will match Unicode characters)
  • (?!\\S) - a negative result that does not match if there is a non-whitespace char on the right side of the current location.

See the Java demo :

 String s = "test test3 t3st test: word%5 test! testing t[st"; Pattern pattern = Pattern.compile("(?<!\\S)\\p{Alpha}+(?!\\S)"); Matcher matcher = pattern.matcher(s); while (matcher.find()){ System.out.println(matcher.group(0)); } 

Exit: test and testing .

+3
source

try it

 Pattern tokenPattern = Pattern.compile("[\\p{L}]+"); 

[\\p{L}]+ prints a group of letters

+1
source

Source: https://habr.com/ru/post/1270065/


All Articles