I am writing a script to capture the independence date of several countries on Wikipedia.
For example, with Kazakhstan:
URL_QS = 'https://en.wikipedia.org/wiki/Kazakhstan' r = requests.get(URL_QS) soup = BeautifulSoup(r.text, 'lxml') # Only keep the infobox (top right) infobox = soup.find("table", class_="infobox geography vcard") if infobox: formation = infobox.find_next(text = re.compile("Formation")) if formation: independence = formation.find_next(text = re.compile("independence")) if independence: independ_date = independence.find_next("td").text else: independence = formation.find_next(text = re.compile("Independence")) if independence: independ_date = independence.find_next("td").text print(independ_date)
And I have the following output:
Almaty
This conclusion is not localized in infoboxes, but after, in the text. This is because "form.find_next (text = re.compile (" independent "))" found something outside the infobox, but I do not understand why the study should not be conducted only in infoboxes? How can I just search in this field?
Thank you in advance for your help!
jGsch source share