I want to make some scrapers using GAE. (Endless campus information portal, fyi). This service requires you to visit the site. I had code that worked using mechanization in regular python. When I found out that I could not use mechanization in the Google App Engine, I ended up using urllib2 + ClientForm. I couldn’t get him to log in to the server, so after several hours of processing the cookies, I ran the same code in a regular python interpreter and it worked. I found the log file and saw a lot of messages about deleting the host header in my request ... I found the source file in Google Code, and the host header was in the "untrustworthy" list and the user code was removed from all the requests.
Apparently, GAE removes the host header that IC requires to determine which school system you are logging into, so it appeared as if I could not log in.
How do I solve this problem? I can not point anything else in my fake to the target site. Why should it be a "security hole" in the first place?
source
share