I need to parse some html to find a set of values from some HTML that is not always well formed and I have no control (so the scanner does not seem to be an option)
This is the shopping basket, and in the basket is the number of rows, each of which contains a drop-down quantity. Now I want to get the total amount of products in the basket.
Given this html, I would like to match the values 2 and 5
...
<select attr="other stuff" name="quantity">
<option value="1" />
<option value="2" selected="selected" />
</select>
....
<select name="quantity" attr="other stuff">
<option selected="selected" value="5" />
<option value="6" />
</select>
I made some pathetic attempts, but given the number of variables (for example, the order of the "value" and "selected" tags), most of my solutions either don't work or are very slow.
The last Java code I ended up with is the following
Pattern pattern = Pattern.compile("select(.*?)name=\"quantity\"([.|\\n|\\r]*?)option(.*?)value=\"(/d)\" selected=\"selected\"", Pattern.DOTALL);
Matcher matcher = pattern.matcher(html);
if (matcher.find()) {
....
}
, . Regex