I am curious to crawl the site (for example, how it is done, etc.), in particular, that I would like to write a script to perform a task for the Hype Machine site . I’m actually a bachelor of software development (4th year), but we don’t do any kind of web programming, so my understanding of the Javascript / RESTFul API / All things on the Web is pretty limited, as we mainly focus on theory applications and client-side applications. Any help or directions that were highly appreciated.
The first thing to look for is that the site already offers some kind of structured data, or if you need to analyze the HTML code yourself. There seems to be an RSS feed for the latest songs . If this is what you are looking for, it would be good to start there.
You can use the scripting language to download the feed and analyze it. I use python, but you can choose a different scripting language if you want. Here are some docs on how you could load the url in python and parse the XML in python .
, , , RSS-, - , scraping script. , , , , , . script , .
:
"Webbots, Spiders Screen Scrapers: - PHP/CURL" http://www.amazon.com/Webbots-Spiders-Screen-Scrapers-Developing/dp/1593271204
" HTTP- #" http://www.amazon.com/HTTP-Programming-Recipes-C-Bots/dp/0977320677
" HTTP- Java-" http://www.amazon.com/HTTP-Programming-Recipes-Java-Bots/dp/0977320669
, , , - . -, google, , - , nutch, Apache.org http://ww.hounder.org - -, , pdf - , nutch, . nutch.apache.org
, -, DOM , , , , mozenda.com. , -. , - .
, - , , spinn3r.com, , . . , !. .
Python has a feedparser module located on feedparser.org, which actually processes RSS in its various variants and ATOM in various variants. There is no reason to reinvent the wheel.
Source: https://habr.com/ru/post/1757504/More articles:How to poll for updates using JSONP? - jsonpЛюбые идеи о том, как я могу реализовать grid-view в CSS? - CSS - htmljquery: Insert a set of elements over another set of elements / merge elements - javascriptString formatting problems and line number concatenation - pythonwordpress get current user - phpКак упростить объявление интерфейса в С++? - c++StackOverflowError 1.5 vs 1.6 - androidSQL: check which identifier already exists - sqlHow to download and replace a Wordpress theme? - wordpressRemove style from selected range / text in div - javascriptAll Articles