"); ...">

Understanding Something in Regular Expressions

If I use a separator for the string:

Scanner scanString = new Scanner(line).useDelimiter("<.*>"); 

I want to know why this will not save the text in

 <a href="https://post.craigslist.org/c/snj?lang=en">post to classifieds</a> 

but it will be in line with

 <option value="ccc">community 

While

 Scanner scanString = new Scanner(line).useDelimiter("<.*?>"); 

will work for both.

As I understand it, this "<.*>" Should exclude a line starting with "<" followed by any character 0 or more times until it reaches ">". So should not it be started anew until it reaches another "<"?

+4
source share
1 answer

This is due to the fact that the second expression uses a reluctant (as opposed to greedy) quantifier, which means that it does not try to match the entire string and from there from there, as the first does.

This expression "<.*>" Tries to move as far as possible into your input string, so that it goes to the end. Once he is there, he discovers that he has a coincidence, and therefore he stops. An invalid version of "<.*?>" Does not do this: it matches the first > and stops.

This article provides an excellent read on quantifiers.

+3
source

Source: https://habr.com/ru/post/1395872/


All Articles