Selen vs Jsoup

I am doing a little scraper made using Selenium (so using this is not an issue). When I need to identify an element (i.e. Get the src attribute), I should use the internal Selenium selection mechanism or use Jsoup (which is much simpler). So the question is: is the use of Jsoup so significant? Should I use selenium as often as possible? Thanks

+5
source share
1 answer

If you have already analyzed the DOM in JSoup, I would recommend using JSoup. This is much faster than selenium, because it does not need to worry about the "live" DOM. Selenium should always check whether element handles are valid before performing any operations with them.

If you can, avoid selenium at all, as its overhead is really noticeable when you do serious scratches. However, Selenium shines if your content is dynamically generated by JavaScript in the client. JSoup cannot handle this at all, as it does not execute JavaScript.

Adding to reply to a comment

Short answer: it depends!

longer: If the website you are cleaning up is generating JavaScript and it does not change after it is created, it is great for accessing it with selenium, especially if the DOM is complex and takes a lot of time to read in JSoup, although JSoup is pretty fast . However, JSoup will generate the DOM in memory again, so if your DOM is huge, you will not only use it in memory, as in selenium, but also in JSoup. This may or may not be a problem in your case, but it is worth keeping in mind.

From my personal experience, I will kill the selenium process as soon as possible after receiving the final HTML and parse it again in JSoup, as it is, as you say: Jsoup scraping is much simpler than the corresponding selector selenium constructs, especially if you are sure that any changes to DOM after initial creation are not related to your scraper.

+7
source

Source: https://habr.com/ru/post/1237043/


All Articles