Why does regex \ w * (\ s + | $) find 2 matches for "foo" (Java)?

Given regex \w*(\s+|$)and input "foo", I would expect Java to Matcher.find()be true only once: \ w * will consume foo, and $in (\ s + | $ ) should destroy the end of the line. I cannot understand why the second find () would also be true given emtpy.

Code example:

public static void main(String[] args) {
    Pattern p = Pattern.compile("\\w*(\\s+|$)");
    Matcher m = p.matcher("foo");

    while (m.find()) {
        System.out.println("'" + m.group() + "'");
    }
}

Expected (for me) conclusion:

'foo'

Actual output:

'foo'
''

UPDATE

My regular expression example should have been just \ w * $ to simplify the discussion that gives the same behavior.

So it looks like zero-length matches are being processed. I found a method Matcher.hitEnd()that tells you that the last match reached the end of the input, so you know that you do not need anotherMatcher.find()

while (!m.hitEnd() && m.find()) {
    System.out.println("'" + m.group() + "'");
}

!m.hitEnd() m.find(), .

+4
3

, \w* , $ .

. "" " http://www.regular-expressions.info.

" ":

, , . \d* . , . 4 abc, .

foo, o, , , .

.

, , . , , , , .

, , \w*\s+|\w+$, :

  • , 1 (, )
  • "", 1 .

| , , , . \w* , - , .

He said: "It done"

:

"He "
" "       the space after the :
"s "      match after the '

, , + *, .. \w+(\s+|$)

+1

Expresion \\w* , Kleene.

\\w+

Edit

Matcher , , reset, . ". , .

+4

2 , foo foo here->.

, , .

.
, -.

, , EOS, . .

You will get the same with \w*, using fooi.e. 2 matches.

+1
source

Source: https://habr.com/ru/post/1681406/


All Articles