Okay, so I'm in a little marinade. I am encountering problems with JSoup since the page needs Javascript to complete loading some pages. Fortunately, I have worked on this in the past (analyzed raw javascript code), and it is very tedious. Recently, I tried to make a program to enter the site, but this requires a token from the element. This form element does not appear if JavaScript is not running, so it will not appear to me at all, even for extraction. So I decided to look at Selenium.
The first question is, is the library I should look into? The reason I am so inclined to use HttpClient is that some of these sites are very high in traffic and do not load completely BUT I don't need these pages to load all the way. I just need to download it enough so that I can get the input token. I prefer to communicate with the web server using the raw JSON / POST methods when I find that the required methods against Selenium automate the click / wait / type sequence.
Basically, I only need selenium to load 1/4 pages, just to get request tokens. The rest of my program will send POST methods using HttpClient.
Or should I just let selenium do all the work? My goal is speed. I need to enter, quickly buy the goods.
Edit: Actually, I can go with HtmlUnit because it is very minimal. I only need to clear the information, and I do not want to start the Selenium StandAlone Server. Is this a better approach?
source
share