Use BeautifulSoup to get the value after a specific tag.

Question

Use BeautifulSoup to get the value after a specific tag.

It is very difficult for me to get BeautifulSoup to clear some data for me. What is the best way to access the date (actual figures, 2008) from this code sample? This is my first time I used Beautifulsoup, I figured out how to clear the URLs of the page, but I can't narrow it down to select only the word Date, and then only return any digital date (in dd brackets). Is what I ask even possible?

<div class='dl_item_container clearfix detail_date'> <dt>Date</dt> <dd> 2008 </dd> </div>

+5

python html-parsing web-scraping beautifulsoup

knames Sep 11 '14 at 3:06

source share

1 answer

alecxe · Accepted Answer · 2014-09-11T03:11:43+0000

Find the dt tag in the text and find the next dd sibling :

 soup.find('div', class_='detail_date').find('dt', text='Date').find_next_sibling('dd').text

Full code:

 from bs4 import BeautifulSoup data = """ <div class='dl_item_container clearfix detail_date'> <dt>Date</dt> <dd> 2008 </dd> </div> """ soup = BeautifulSoup(data) date_field = soup.find('div', class_='detail_date').find('dt', text='Date') print date_field.find_next_sibling('dd').text.strip()

Print 2008 .

Use BeautifulSoup to get the value after a specific tag.

More articles: