I am trying to encode a scraper in Python to get information from a page. Like the name of the offers that appear on this page:https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585
Now I use this code:
import bs4 import requests def extract_source(url): source=requests.get(url).text return source def extract_data(source): soup=bs4.BeautifulSoup(source) names=soup.findAll('title') for i in names: print i extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))
But when I execute this code, it gives me an error:
<titlee> Access Denied</titlee>
What can I do to solve this problem?
As mentioned in the comments, you need to specify a valid user agent and pass it as headers:
headers
def extract_source(url): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'} source=requests.get(url, headers=headers).text return source
def extract_source(url): headers = {"User-Agent":"Mozilla/5.0"} source=requests.get(url, headers=headers).text return source
of
<title>Saree Retailers in Panipat - Best Deals online - Justdial</title>
User-Agent , - , User-Agent
User-Agent
:
import bs4 import requests def extract_source(url): agent = {"User-Agent":"Mozilla/5.0"} source=requests.get(url, headers=agent).text return source def extract_data(source): soup=bs4.BeautifulSoup(source, 'lxml') names=soup.findAll('title') for i in names: print i extract_data(extract_source('https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585'))
'lxml', .
Source: https://habr.com/ru/post/1668528/More articles:Android: display the TextView on the right side of the EditText, which is in TextInputLayout - androidRouter Reagent 4 Beta 2 with ReactCSSTransitionGroup - react-routerDifferences between DDD and MDD - designcaffe could not open or find file - deep-learningNo compilation error when working with structure and properties - c #How to check Gradle CopySpec for export to text view? - javaIs it good practice to return PreparedStatement from a utility? - javaFinding the closest reconciliation time for each patient - timeHow to create custom events for the "ws" Web-Socket module? - javascriptSearch values โโfor the nearest date - rAll Articles