Use BeautifulSoup to get the value after a specific tag.

It is very difficult for me to get BeautifulSoup to clear some data for me. What is the best way to access the date (actual figures, 2008) from this code sample? This is my first time I used Beautifulsoup, I figured out how to clear the URLs of the page, but I can't narrow it down to select only the word Date, and then only return any digital date (in dd brackets). Is what I ask even possible?

<div class='dl_item_container clearfix detail_date'> <dt>Date</dt> <dd> 2008 </dd> </div> 
+5
source share
1 answer

Find the dt tag in the text and find the next dd sibling :

 soup.find('div', class_='detail_date').find('dt', text='Date').find_next_sibling('dd').text 

Full code:

 from bs4 import BeautifulSoup data = """ <div class='dl_item_container clearfix detail_date'> <dt>Date</dt> <dd> 2008 </dd> </div> """ soup = BeautifulSoup(data) date_field = soup.find('div', class_='detail_date').find('dt', text='Date') print date_field.find_next_sibling('dd').text.strip() 

Print 2008 .

+11
source

Source: https://habr.com/ru/post/1202297/


All Articles