Understanding Something in Regular Expressions
If I use a separator for the string:
Scanner scanString = new Scanner(line).useDelimiter("<.*>"); I want to know why this will not save the text in
<a href="https://post.craigslist.org/c/snj?lang=en">post to classifieds</a> but it will be in line with
<option value="ccc">community While
Scanner scanString = new Scanner(line).useDelimiter("<.*?>"); will work for both.
As I understand it, this "<.*>" Should exclude a line starting with "<" followed by any character 0 or more times until it reaches ">". So should not it be started anew until it reaches another "<"?
This is due to the fact that the second expression uses a reluctant (as opposed to greedy) quantifier, which means that it does not try to match the entire string and from there from there, as the first does.
This expression "<.*>" Tries to move as far as possible into your input string, so that it goes to the end. Once he is there, he discovers that he has a coincidence, and therefore he stops. An invalid version of "<.*?>" Does not do this: it matches the first > and stops.
This article provides an excellent read on quantifiers.