Given regex \w*(\s+|$)and input "foo", I would expect Java to Matcher.find()be true only once: \ w * will consume foo, and $in (\ s + | $ ) should destroy the end of the line. I cannot understand why the second find () would also be true given emtpy.
Code example:
public static void main(String[] args) {
Pattern p = Pattern.compile("\\w*(\\s+|$)");
Matcher m = p.matcher("foo");
while (m.find()) {
System.out.println("'" + m.group() + "'");
}
}
Expected (for me) conclusion:
'foo'
Actual output:
'foo'
''
UPDATE
My regular expression example should have been just \ w * $ to simplify the discussion that gives the same behavior.
So it looks like zero-length matches are being processed. I found a method Matcher.hitEnd()that tells you that the last match reached the end of the input, so you know that you do not need anotherMatcher.find()
while (!m.hitEnd() && m.find()) {
System.out.println("'" + m.group() + "'");
}
!m.hitEnd() m.find(), .