Copy data from the page that requires a login

I am new to Python and Web Scapping, and I am trying to write a very simple script that will receive data from a web page that can only be accessed after logging in. I looked through a few examples, but none fix the problem. This is what I have so far:

from bs4 import BeautifulSoup
import urllib, urllib2, cookielib

username = 'name'
password = 'pass'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('WebpageWithLoginForm')
resp = opener.open('WebpageIWantToAccess')
soup = BeautifulSoup(resp, 'html.parser')
print soup.prettify()

Like now, when I print a page, it just prints the contents of the page as if I were not registered. I think the problem is with the way I set cookies, but I'm really not sure because I do not quite understand what is happening with the cookie processor and its libraries. Thank!

Current code:

import requests
import sys

EMAIL = 'usr'
PASSWORD = 'pass'

URL = 'https://connect.lehigh.edu/app/login'

def main():
    # Start a session so we can have persistant cookies
    session = requests.session(config={'verbose': sys.stderr})
    # This is the form data that the page sends when logging in
    login_data = {
        'username': EMAIL,
        'password': PASSWORD,
        'LOGIN': 'login',
    }

    # Authenticate
    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    r = session.get('https://lewisweb.cc.lehigh.edu/PROD/bwskfshd.P_CrseSchdDetl')

if __name__ == '__main__':
    main()
+4
1
0

Source: https://habr.com/ru/post/1649820/


All Articles