HTML parsing with OCaml

I am looking for a library for parsing HTML files in OCaml. Basically equivalent to Jsoup / Beautiful Soup. The main requirement is to query the DOM using CSS selectors. Something in shape

page.fetch("http://www.url.com") page.find("#tag") 
+5
source share
1 answer

Recently, I needed something similar, so when I saw this question and read the recommendations in the comments, I wrote the library " Lambda Soup " on the weekend for fun.

You want to use a library like ocurl or Cohttp to get the actual HTML. After that you can do

 html |> parse $ "#tag" 

do what is asked in the question. For other features and a full signature, see the Documentation. You might want to look at the postprocessor documentation or tests for a thorough enough demonstration of usage and features, including CSS support and extensions.

Lambda Soup uses the Ocamlnet HTML parser in the comments. Lambda Soup uses Markup.ml . Otherwise, it has no dependencies except OUnit if you want to run tests. I am glad for any feedback, including the modification of the interface (this is at an early stage) or the discussion of adding an HTTP loader to the library (which seems inconvenient, since it significantly changes the volume of the library, as it is now, but I am glad to hear the arguments).

License - BSD.

+5
source

Source: https://habr.com/ru/post/1235070/


All Articles