Suppose the following is a subset of an HTML document ... note that multiple tables are repeated, although <a name="1"> may be "2", "3", "4", etc. with different text for each table.
<table align="center" width="550"> <tr> <td valign="top" width="300"><b>Product:</b></img></td> <td> <a name="1"></a>1) Text Editor <p>An application for the editing of text files.</p> <br> <b>Application Name: Notepad</b> <br> <b>Type: Writing</b> <br><br></td> </tr> </table>
I want to find the tag "a", which is equal to a certain "#" (in this case 1) and be able to somehow get the text: "1) Text editor".
I know that if I beautifully parsed the entire document, I could use something like findAll("table") to give me all the tables, but I don’t know how I can get to this value. I can do something like findAll("a") , but how would I specify a "name" equal to (1 in this case)? Even if I could do this, I would not be able to get to “1” of the text editor ", because the tag" a "is empty. And I also could not get into things like" <b>Application Name: Notepad</b> ".
What is the best solution combined with python / beautifulsoup or if there is a better way to get these “1” text editors and “Application name” and “Type” in the table based on the fact that it is facing <a name="1"></a> ? An example syntax would be great.
source share