A regular expression that extracts words from a string

I want to extract all words from a java string.

Word

can be written in any European language and does not contain spaces, only alpha characters.

it may contain hyphens.

+3
source share
3 answers

If you are not attached to regular expressions, look also at BreakIterator , in particular the getWordInstance () method :

, , . . , , , .

+3

(?<!\S)\S+(?!\S), .

  • , "" .
  • \S, -
    • (, [A-Za-z-] ..)

, , [a-z-] :

    String text = "--xx128736f-afasdf2137asdf-12387-kjs-23xx--";
    Pattern p = Pattern.compile(
        "(?<!alpha)alpha+(?!alpha)".replace("alpha", "[a-z-]")
    );
    Matcher m = p.matcher(text);
    while (m.find()) {
        System.out.println(m.group());
    }

:

--xx
f-afasdf
asdf-
-kjs-
xx--


?

, Unicode .. (, )

+2

This will match one word:

`([^\s]+)`
0
source

Source: https://habr.com/ru/post/1752237/


All Articles