Unable to clear a specific table using BeautifulSoup4 (Python 3)

I would like to clear the table from the football site Ligue 1. In particular, a table containing information about the cards and referees.

http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1

I am using the following code:

import requests from bs4 import BeautifulSoup import csv r=requests.get("http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1") soup= BeautifulSoup(r.content, "html.parser") table=soup.find_all('table') 

This returns another table elsewhere in html. I tried to get around this using [0] , [1] etc. After the find all function, but I return nothing. I also looked for tr and td , but got similar results. I have no idea why a beautiful soup ignores this table.

The table I'm looking for is in the HTML below

 <table> <thead> <tr> <th class="{sorter: false} hide position">Position</th> <th class="{sorter: false} joueur">Referees</th> <th class="chiffre header"><span class="icon icon_carton_jaune">Yellow card</span></th> <th class="chiffre header"><span class="icon icon_carton_rouge">Red card</span></th> <th class="chiffre header">Matches</th> </tr> </thead> <tbody><tr> <td class="position"></td> <td class="joueur">Benoît BASTIEN</td> <td class="chiffre"><a href="/stats_arbitre_details/245">25</a></td> <td class="chiffre"><a href="/stats_arbitre_details/245">4</a></td> <td class="chiffre">8</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Hakim BEN EL HADJ</td> <td class="chiffre"><a href="/stats_arbitre_details/259">55</a></td> <td class="chiffre"><a href="/stats_arbitre_details/259">4</a></td> <td class="chiffre">10</td> </tr> <tr> <td class="position"></td> <td class="joueur">Wilfried BIEN</td> <td class="chiffre"><a href="/stats_arbitre_details/162">44</a></td> <td class="chiffre"><a href="/stats_arbitre_details/162">3</a></td> <td class="chiffre">9</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Ruddy BUQUET</td> <td class="chiffre"><a href="/stats_arbitre_details/269">33</a></td> <td class="chiffre"><a href="/stats_arbitre_details/269">2</a></td> <td class="chiffre">7</td> </tr> <tr> <td class="position"></td> <td class="joueur">Tony CHAPRON</td> <td class="chiffre"><a href="/stats_arbitre_details/102">43</a></td> <td class="chiffre"><a href="/stats_arbitre_details/102">1</a></td> <td class="chiffre">8</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Amaury DELERUE</td> <td class="chiffre"><a href="/stats_arbitre_details/343">30</a></td> <td class="chiffre"><a href="/stats_arbitre_details/343">0</a></td> <td class="chiffre">6</td> </tr> <tr> <td class="position"></td> <td class="joueur">Saïd ENNJIMI</td> <td class="chiffre"><a href="/stats_arbitre_details/113">27</a></td> <td class="chiffre"><a href="/stats_arbitre_details/113">1</a></td> <td class="chiffre">6</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Fredy FAUTREL</td> <td class="chiffre"><a href="/stats_arbitre_details/338">25</a></td> <td class="chiffre"><a href="/stats_arbitre_details/338">2</a></td> <td class="chiffre">8</td> </tr> <tr> <td class="position"></td> <td class="joueur">Antony GAUTIER</td> <td class="chiffre"><a href="/stats_arbitre_details/331">31</a></td> <td class="chiffre"><a href="/stats_arbitre_details/331">8</a></td> <td class="chiffre">9</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Johan HAMEL</td> <td class="chiffre"><a href="/stats_arbitre_details/334">43</a></td> <td class="chiffre"><a href="/stats_arbitre_details/334">7</a></td> <td class="chiffre">9</td> </tr> <tr> <td class="position"></td> <td class="joueur">Lionel JAFFREDO</td> <td class="chiffre"><a href="/stats_arbitre_details/124">40</a></td> <td class="chiffre"><a href="/stats_arbitre_details/124">2</a></td> <td class="chiffre">9</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Stéphane JOCHEM</td> <td class="chiffre"><a href="/stats_arbitre_details/294">33</a></td> <td class="chiffre"><a href="/stats_arbitre_details/294">4</a></td> <td class="chiffre">8</td> </tr> <tr> <td class="position"></td> <td class="joueur">Stéphane LANNOY</td> <td class="chiffre"><a href="/stats_arbitre_details/127">24</a></td> <td class="chiffre"><a href="/stats_arbitre_details/127">0</a></td> <td class="chiffre">6</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Mikael LESAGE</td> <td class="chiffre"><a href="/stats_arbitre_details/286">38</a></td> <td class="chiffre"><a href="/stats_arbitre_details/286">3</a></td> <td class="chiffre">9</td> </tr> <tr> <td class="position"></td> <td class="joueur">Jérôme MIGUELGORRY</td> <td class="chiffre"><a href="/stats_arbitre_details/239">32</a></td> <td class="chiffre"><a href="/stats_arbitre_details/239">1</a></td> <td class="chiffre">10</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Benoît MILLOT</td> <td class="chiffre"><a href="/stats_arbitre_details/287">43</a></td> <td class="chiffre"><a href="/stats_arbitre_details/287">0</a></td> <td class="chiffre">11</td> </tr> <tr> <td class="position"></td> <td class="joueur">Sébastien MOREIRA</td> <td class="chiffre"><a href="/stats_arbitre_details/148">38</a></td> <td class="chiffre"><a href="/stats_arbitre_details/148">5</a></td> <td class="chiffre">10</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Nicolas RAINVILLE</td> <td class="chiffre"><a href="/stats_arbitre_details/188">40</a></td> <td class="chiffre"><a href="/stats_arbitre_details/188">7</a></td> <td class="chiffre">10</td> </tr> <tr> <td class="position"></td> <td class="joueur">Frank SCHNEIDER</td> <td class="chiffre"><a href="/stats_arbitre_details/247">33</a></td> <td class="chiffre"><a href="/stats_arbitre_details/247">4</a></td> <td class="chiffre">10</td> </tr> <tr class="odd"> <td class="position"></td> <td class="joueur">Clément TURPIN</td> <td class="chiffre"><a href="/stats_arbitre_details/333">26</a></td> <td class="chiffre"><a href="/stats_arbitre_details/333">3</a></td> <td class="chiffre">8</td> </tr> <tr> <td class="position"></td> <td class="joueur">Bartolomeu VARELA</td> <td class="chiffre"><a href="/stats_arbitre_details/288">35</a></td> <td class="chiffre"><a href="/stats_arbitre_details/288">3</a></td> <td class="chiffre">9</td> </tr> </tbody></table> 

I also tried searching for td with a specific class that should work, but it cannot select the table in the first place.

+5
source share
2 answers

The problem is that (I assume) you are viewing the HTML code generated by the browser, and what you are missing is that the table is added to the page using javascript.

You can confirm this with chrome (or any other browser), and instead of “Inspect” look for “View Page Source”, and you will notice that there is no such table in the server response.

The url that it calls is " http://www.ligue1.com/stats_arbitre?competition=D1 ", but there is a trick, you must indicate through the HTTP headers that the request is XHR. If you try in a browser with this URL, you will get a 500 response.

Try this curl example to see if a table is needed.

curl --header "X-Requested-With: XMLHttpRequest" http://www.ligue1.com/stats_arbitre?competition=D1

In your code, do the following:

 import requests from bs4 import BeautifulSoup import csv headers = {'X-Requested-With': 'XMLHttpRequest'} r = requests.get('http://www.ligue1.com/stats_arbitre?competition=D1', headers=headers) ... 

Hope this helps

+2
source

Selenium can do it.

 from selenium import webdriver import time driver = webdriver.Firefox() driver.get(url) time.sleep(5) htmlSource = driver.page_source 
0
source

Source: https://habr.com/ru/post/1238276/


All Articles