A scraper from a website that requires a login?

Can this be done, if so, how? I want to clear data from xbox.com, but the pages that I need to clear only appear after a successful login.

+4
source share
5 answers

You will need to go through the necessary login transaction by sending POST data with your CURL requests. However, it’s a bad idea to clear data due to logging in - the site did not post this information in public places for any reason, and for this you may infringe on copyright,

+2
source

Most registration forms will set a cookie. Therefore, you should use an HTTP class such as Zend_Http, which can store them for future requests. This is apparently as simple as:

$client = new Zend_Http_Client(); $client->setCookieJar(); // this is the crucial part for "logging in" // make login request $client->setUri("http://xbox.com/login"); $client->setParameterPost("login", "hackz0r"); $result = $client->request('POST'); // go scraping ... 
+8
source

This can be done in theory if you have a web selection that supports cookies. It looks like PHP HTTP_Request2 from PEAR can send cookies if you provide cookie information as part of the request. All you have to do is:

  • Send login request
  • Extract cookie data from HTTP response headers to the above request
  • Set cookie data for subsequent requests

Please note that many sites will use anti-aliasing techniques of varying degrees of complexity and can make this more difficult. It may also be illegal, immoral or contrary to the agreement with the user of the site.

+2
source

There are several ways to log in automatically, somewhat more complicated than others. xbox.com probably uses the Windows Live API, so you will need to study the documentation for this.

0
source

The PHP PGBrowser library can do this quite easily. The following is a snippet of demo code taken from a companion blog . I believe this will not work with the XBox website, because Microsoft now uses SSO, but is still applicable to other sites with login form contents.

 require 'pgbrowser.php'; $b = new PGBrowser(); $b->useCache = true; $page = $b->get('http://yoursite.com/login'); // Retrieve login web page $form = $page->forms(1); // Retrieve form // Note the form field names have to be specified $form->set('username', "your_username_or_email"); $form->set('password', "your_password"); $page = $form->submit(); // Submit form echo $page->html; // This shows the web page normally displayed after successful login, eg dashboard 
0
source

Source: https://habr.com/ru/post/1339731/


All Articles