I am very puzzled. I have an HTML block that I popped up from a large table. It looks something like this:
<td align="left" class="page">Number:\xc2\xa0<a class="topmenu" href="http://www.example.com/whatever.asp?search=724461">724461</a> Date:\xc2\xa01/1/1999 Amount:\xc2\xa0$2.50 <br/>Person:<br/><a class="topmenu" href="http://www.example.com/whatever.asp?search=LAST&searchfn=FIRST">LAST,\xc2\xa0FIRST </a> </td>
(Actually, it looked worse, but I reworked the lines many times)
I need to print the lines and split the Date / Amount line. It seemed like a place to start was to find the children of this HTML block. A block is a string because it returned me a regular expression. So I did:
text_soup = BeautifulSoup(text) text_children = text_soup.find('td').childGenerator()
I can iterate over child elements with
for i,each in enumerate(text_soup.find('td').childGenerator()): print type(each) print i, ":", each
but not with
for i, each in enumerate(text_children): ...etc
They must be the same. Therefore, I am confused.
source share