ab

How to get these values ​​using BeautifulSoup?

I have this html table:

<table>
    <tr>
        <td class="datax">a</td>
        <td class="datax">b</td>
        <td class="datax">c</td>
        <td class="datax">d</td>
    </tr>
    <tr>
        <td class="datax">e</td>
        <td class="datax">f</td>
        <td class="datax">g</td>
        <td class="datax">h</td>
    </tr>
</table>

How to get the second and fourth value of each <tr>? If I do:

bs.findAll('td', {'class':'datax'})

I get:

        <td class="datax">a</td>
        <td class="datax">b</td>
        <td class="datax">c</td>
        <td class="datax">d</td>

        <td class="datax">e</td>
        <td class="datax">f</td>
        <td class="datax">g</td>
        <td class="datax">h</td>

it is right! but I would like to get this result:

        <td class="datax">b</td>
        <td class="datax">d</td>

        <td class="datax">f</td>
        <td class="datax">h</td>

so the values ​​i want is → b - d - f - h

(second and fourth <td>for each <tr>)

Is this possible with the BeautifulSoup module?

Many thanks!

+3
source share
2 answers

That should do it ~

final_values=[td.string for td in bs.findAll('td', {'class':'datax'})[1::2]]

(after explaining the comments) for your specific case this would be:

final_values=[td.b.a.string for td in bs.findAll('td', {'class':'datax'})[1::2]]
+5
source

I know using HTQL, it's simple:

<TR>. & L; td> 2.4

-

HTQL COM-. javascript:

<html>
<body>
< script language = JavaScript >
    var a = ActiveXObject ( "HtqlCom.HtqlControl" );
    a.setUrl( "C:\\test_table.html" );
    a.setQuery( "< p > <. > 2,4" );
    (a.moveFirst();! a.isEOF(); a.moveNext()) {
        document.write(a.getValueByIndex(1));
    }
</script>
</body>
</html>

-2

Source: https://habr.com/ru/post/1744778/


All Articles