How to download a file from a website that requires login information using Python?

Question

How to download a file from a website that requires login information using Python?

I am trying to download some data from a website using Python. If you just copy and paste the URL, it will not show anything unless you fill in the registration information. I have a username and password, but how do I enable them in Python?

My current code is:

import urllib, urllib2, cookielib

username = my_user_name
password = my_pwd

link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})

opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()

The error does not appear, however resp.read () is a bunch of CSS, and it only has messages like "you need to log in before reading the news here."

So, how can I get the page that is after login?

I just noticed that the site requires 3 entries:

Company: 

Username: 

Password:

I have everything, but how can I put all three in an input variable?

If I run it without logging in, it will return:

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.open(dd)
resp = opener.open(dd)

print resp.read()

Here are the prints:

<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">

+4

python html login web website urllib2

lsheng 02 . '14 5:55

2

pythondjango · Answer 1 · 2014-04-02T06:13:59+0000

scrapy , Scrapy

class LoginSpider(Spider):
    name = 'example.com'
    start_urls = ['http://www.example.com/users/login.php']

    def parse(self, response):
        return [FormRequest.from_response(response,
                    formdata={'username': 'john', 'password': 'secret'},
                    callback=self.after_login)]

    def after_login(self, response):
        # check login succeed before going on
        if "authentication failed" in response.body:
            self.log("Login failed", level=log.ERROR)
            return

user2629998 · Answer 2 · 2014-04-02T06:24:35+0000

, Python-Requests - ... , , .

from requests import Session

s = Session() # this session will hold the cookies

# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})

# now we're logged in and can request any page
resp = s.get("http://.../").text

print(resp)

How to download a file from a website that requires login information using Python?

So, how can I get the page that is after login?

More articles: