BS4: Retrieving Text in a Tag
I use beautiful soup. There is such a tag:
<li><a href="example"> sro, <small>small</small></a></li>
I want to get text only inside the <a> anchor <a> , without the <small> in the output; i.e. " sro, "
I tried find('li').text[0] but it does not work.
Is there a team in BS4 that can do this?
One option is to get the first element from the contents of a :
>>> from bs4 import BeautifulSoup >>> data = '<li><a href="example"> sro, <small>small</small></a></li>' >>> soup = BeautifulSoup(data) >>> print soup.find('a').contents[0] sro, Another would be to find the small tag and get the previous sibling :
>>> print soup.find('small').previous_sibling sro, Well, there are all sorts of alternative / crazy options:
>>> print next(soup.find('a').descendants) sro, >>> print next(iter(soup.find('a'))) sro, If you want to run a loop to print the entire contents of the anchor tags located in the html line / web page (you must use urlopen from urllib), this works:
from bs4 import BeautifulSoup data = '<li><a href="example">sro, <small>small</small</a></li> <li><a href="example">2nd</a></li> <li><a href="example">3rd</a></li>' soup = BeautifulSoup(data,'html.parser') a_tag=soup('a') for tag in a_tag: print(tag.contents[0]) #.contents method to locate text within <a> tags Output:
sro, 2nd 3rd a_tag - a list containing all anchor tags; collecting all the anchor tags in the list allows editing groups (if there is more than one <a> tag.
>>>print(a_tag) [<a href="example">sro, <small>small</small></a>, <a href="example">2nd</a>, <a href="example">3rd</a>]