everything. I'm having trouble getting links in nested HTML using Mechanize in Python. Here is my current code (I tried everything, this is only the last copy that doesnβt work like that) (and please forgive my variable names (thing, material)):
soup = BeautifulSoup(resultsPage) if not soup.find(attrs={'class' : 'paging'}): print "Only one producted listed!" else: stuff = soup.find('div', attrs={'class' : 'paging'}).ul.li for thing in stuff: print thing
Here is the HTML I'm looking at:
<div class="paging"> <ul> <li>< </li> <li class='on'> 1-10 </li> <li class=''> <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl01_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&brandid=22&searchtext=jell-o&pageno=2">11-20</a> </li> <li class=''> <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl02_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&brandid=22&searchtext=jell-o&pageno=3">21-30</a> </li> <li class=''> <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl03_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&brandid=22&searchtext=jell-o&pageno=4">31-40</a> </li> <li class=''> <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl04_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&brandid=22&searchtext=jell-o&pageno=5">41-50</a> </li> <li class=''> <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl05_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&brandid=22&searchtext=jell-o&pageno=6">51-60</a> </li> <li> <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_lnkNext" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&brandid=22&searchtext=jell-o&pageno=7">>></a> </li> </ul>
I need to determine if there are <li>
tags with hyperlinks in them; if there is, I need to save them for later click. This is the page where the code came from, if you are interested: http://www.kraftrecipes.com/Products/ProductInfoSearchResults.aspx?CatalogType=1&BrandId=22&SearchText=Jell-O&PageNo=1 I'm working on something to clear websites product information, and I need to be able to navigate the search results.
I have another quick question. Is it good to bind tags and similar queries?
ingredients = soup.find(attrs={'class' : "TitleAndDescription"}).div.find(text=re.compile("Ingredients")).next
I am just learning Python, but it is like kludge-y, and I would like to know what you guys think. Here is an example of HTML that I scraped:
<table> <tr> <td> <div id="contHeader" class="TitleAndDescription"> <h1>JELL-O - GELATIN DESSERT - RASPBERRY</h1> <div class="textArea"> <strong>Ingredients:</strong> SUGAR, GELATIN, ADIPIC ACID (FOR TARTNESS), CONTAINS LESS THAN 2% OF ARTIFICIAL FLAVOR, DISODIUM PHOSPHATE AND SODIUM CITRATE (CONTROL ACIDITY), FUMARIC ACID (FOR TARTNESS), RED 40.<br/> <strong>Size:</strong> 6 OZ<br/><strong>Upc:</strong> 4300020052<br/> <br/> <br/> </div> </div> ... </td> ... </tr> ... </table>
Sorry for the text wall. Let me know if you need more information.
Thanks.
user1074600
source share