I am learning F # and I started playing with both sequences and expressions match.
I am writing a web scraper that scans HTML similar to the following and take the last URL in the parent <span>with the class paging.
<html>
<body>
<span class="paging">
<a href="http://google.com">Link to Google</a>
<a href="http://TheLinkIWant.com">The Link I want</a>
</span>
</body>
</html>
My attempt to get the last url is as follows:
type AnHtmlPage = FSharp.Data.HtmlProvider<"http://somesite.com">
let findMaxPageNumber (page:AnHtmlPage)=
page.Html.Descendants()
|> Seq.filter(fun n -> n.HasClass("paging"))
|> Seq.collect(fun n -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
However, I run into problems when the class I'm looking for is missing from the page. In particular, I get ArgumentExceptions with the message:Additional information: The input sequence was empty.
My first thought was to create another function that matched empty sequences and returned an empty string if the class was pagingnot found on the page.
let findUrlOrReturnEmptyString (span:seq<HtmlNode>) =
match span with
| Seq.empty -> String.Empty
| span -> span
|> Seq.collect(fun (n:HtmlNode) -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")
let findMaxPageNumber (page:AnHtmlPage)=
page.Html.Descendants()
|> Seq.filter(fun n -> n.HasClass("paging"))
|> findUrlOrReturnEmptyStrin
, Seq.Empty . [] , : ?