Get number from span tag using Python and Beautiful Soup queries

I am new to python and html. I am trying to get the number of comments from a page using queries and BeautifulSoup.

In this example, I am trying to get the number 226. Here is the code that I see when I check the page in Chrome:

<a title="Go to the comments page" class="article__comments-counts" href="http://www.theglobeandmail.com/opinion/will-kevin-oleary-be-stopped/article33519766/comments/"> <span class="civil-comment-count" data-site-id="globeandmail" data-id="33519766" data-language="en"> 226 </span> Comments </a> 

When I request text from a URL, I can find the code, but there is no content between the span tags, no 226. Here is my code:

 import requests, bs4 url = 'http://www.theglobeandmail.com/opinion/will-kevin-oleary-be-stopped/article33519766/' r = requests.get() soup = bs4.BeautifulSoup(r.text, 'html.parser') span = soup.find('span', class_='civil-comment-count') 

He returns this, as above, but not 226.

 <span class="civil-comment-count" data-id="33519766" data-language="en" data-site-id="globeandmail"> </span> 

I do not understand why the meaning does not appear. Thank you in advance for any help.

+6
source share
2 answers

On the page and, in particular, the number of comments, there is JavaScript to load and display. But you do not need to use Selenium, make an API request for it:

 import requests with requests.Session() as session: session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36"} # visit main page base_url = 'http://www.theglobeandmail.com/opinion/will-kevin-oleary-be-stopped/article33519766/' session.get(base_url) # get the comments count url = "https://api-civilcomments.global.ssl.fastly.net/api/v1/topics/multiple_comments_count.json" params = {"publication_slug": "globeandmail", "reference_language": "en", "reference_ids": "33519766"} r = session.get(url, params=params) print(r.json()) 

Print

 {'comment_counts': {'33519766': 226}} 
+4
source

This page uses JavaScript to get the comment number, here is what the page looks like when JavaScript is disabled: enter image description here

You can find the real url that contains the number in the Chrome developer tools: enter image description here

Than you can simulate requests using @alecxe code.

+2
source

Source: https://habr.com/ru/post/1013866/


All Articles