I am trying to parse a webpage using this solution similar to the following:
from bs4 import BeautifulSoup as bs import re import time import random ---------------------- import socks import socket
But I get this error:
Traceback (most recent call last): File "soupParse.py", line 159, in <module> all_r = main() File "soupParse.py", line 35, in main page = urllib2.urlopen(req) File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden
Here is the header function:
# create random request header def request_header():
I am not very good at this topic, so it is difficult for me to understand the problem. Please help. Thank.
UPDATE
I was able to determine that this error only occurs with urllib2 . If I use Requests , for example, there is no error. I did not say that this is the answer, since I do not know why this problem exists. If anyone knows, I would be glad to hear that.
Good luck and happy therapy!