As a training exercise, I am writing a web scraper in Common Lisp. (Rough) plan:
I just came across the fact that the website I am scraping doesn't always get valid XHTML. This means that step 3 (analyze pages with xmls) does not work. And I just do not want to use a regular expression as the guy : -)
So, can anyone recommend a generic Lisp package for parsing invalid XHTML? I present something similar to HTML Agility Pack for .NET ...
The clos-html project (available in Quicklisp) will recover from fictitious HTML and produce something you can work with. I use clos-html along with CXML to handle arbitrary web pages, and it works well. http://common-lisp.net/project/closure/closure-html/
For the following visitors: today we have Plump: https://shinmera.imtqy.com/plump
Plump - HTML/XML, . , , , , , .. DOM . , .
, lquery (jquery-like) CLSS ( CSS) .
Common Lisp Cookbook: https://lispcookbook.imtqy.com/cl-cookbook/web-scraping.html
. Common Lisp wiki: http://www.cliki.net/Web
Duncan, so far I have successfully used Clozure Common Lisp under Ubuntu Linux and Windows (7 and XP), so if you are looking for an implementation that will work anywhere, you can try this.
Source: https://habr.com/ru/post/1783530/More articles:Is it possible to “share” or “somehow” using the Facebook C # SDK? - facebookHow to copy a vector column? - c ++Creating a batch process in python - pythonDoes "meld" have a command line style? - meldEasy binary editing in python - pythonКак просмотреть все доступные методы в файле svc? - wcfrails and sorting jquery - jquerysolr.RandomSortField on multiple instances of the solr server - sortingJPA optimistic handling of the lock version - should the version value be ported to the client side or? - javaAntivirus Prevention - securityAll Articles