Can someone explain how filtering works with Beautiful Soup. I got below HTML. I am trying to filter out certain data, but I cannot access it. Ive tried various approaches, from collecting everything class=gto capturing only items of interest in this particular div, but I just get None, returns or doesn't print.
Each page has a div <div class="srg">with several divs <div class="g">, the data I'm looking to use is data from <div class="g">. Each of them has several divs, but im are only interested in data <cite>and <span class="st">. I'm struggling to understand how filtering works, any help would be appreciated.
I tried going through the div and grab the appropriate fields:
soup = BeautifulSoup(response.text)
main = soup.find('div', {'class': 'srg'})
result = main.find('div', {'class': 'g'})
data = result.find('div', {'class': 's'})
data2 = data.find('div')
for item in data2:
site = item.find('cite')
comment = item.find('span', {'class': 'st'})
print site
print comment
div :
soup = BeautifulSoup(response.text)
s = soup.findAll('div', {'class': 's'})
for result in s:
site = result.find('cite')
comment = result.find('span', {'class': 'st'})
print site
print comment
<div class="srg">
<div class="g">
<div class="g">
<div class="g">
<div class="g">
<div class="rc" data="30">
<div class="s">
<div>
<div class="f kv _SWb" style="white-space:nowrap">
<cite class="_Rm">http://www.url.com.stuff/here</cite>
<span class="st">http://www.url.com. Some info on url etc etc
</span>
</div>
</div>
</div>
</div>
<div class="g">
<div class="g">
<div class="g">
</div>
UPDATE
Alecxe , , . soup, . response.text requests. , BeautifulSoup response.text, - ( , ). , , soup. , , .
<li class="g">
<h3 class="r">
<a href="/url?q=url">context</a>
</h3>
<div class="s">
<div class="kv" style="margin-bottom:2px">
<cite>www.url.com/index.html</cite> #Data I am looking to grab
<div class="_nBb">
<div style="display:inline"snipped">
<span class="_O0"></span>
</div>
<div style="display:none" class="am-dropdown-menu" role="menu" tabindex="-1">
<ul>
<li class="_Ykb">
<a class="_Zkb" href="/url?/search">Cached</a>
</li>
</ul>
</div>
</div>
</div>
<span class="st">Details about URI </span> #Data I am looking to grab
, ?
soup = BeautifulSoup(response.text)
for cite in soup.select("li.g div.s div.kv cite"):
span = cite.find_next_sibling("span", class_="st")
print(cite.get_text(strip=True))
print(span.get_text(strip=True))