BS4: Retrieving Text in a Tag

Question

BS4: Retrieving Text in a Tag

I use beautiful soup. There is such a tag:

<li><a href="example"> sro, <small>small</small></a></li>

I want to get text only inside the <a> anchor <a> , without the <small> in the output; i.e. " sro, "

I tried find('li').text[0] but it does not work.

Is there a team in BS4 that can do this?

+12

python html parsing html-parsing beautifulsoup

Milano Aug 11 '14 at 20:27

source share

3 answers

Use . children

 soup.find('a').children.next() sro,

+2

Padraic cunningham Aug 11 '14 at 20:37

source share

If you want to run a loop to print the entire contents of the anchor tags located in the html line / web page (you must use urlopen from urllib), this works:

 from bs4 import BeautifulSoup data = '<li><a href="example">sro, <small>small</small</a></li> <li><a href="example">2nd</a></li> <li><a href="example">3rd</a></li>' soup = BeautifulSoup(data,'html.parser') a_tag=soup('a') for tag in a_tag: print(tag.contents[0]) #.contents method to locate text within <a> tags

Output:

 sro, 2nd 3rd

a_tag - a list containing all anchor tags; collecting all the anchor tags in the list allows editing groups (if there is more than one <a> tag.

 >>>print(a_tag) [<a href="example">sro, <small>small</small></a>, <a href="example">2nd</a>, <a href="example">3rd</a>]

0

Sumanth lazarus Mar 20 '19 at 7:56

source share

alecxe · Accepted Answer · 2014-08-11T20:45:59+0000

One option is to get the first element from the contents of a :

 >>> from bs4 import BeautifulSoup >>> data = '<li><a href="example"> sro, <small>small</small></a></li>' >>> soup = BeautifulSoup(data) >>> print soup.find('a').contents[0] sro,

Another would be to find the small tag and get the previous sibling :

 >>> print soup.find('small').previous_sibling sro,

Well, there are all sorts of alternative / crazy options:

 >>> print next(soup.find('a').descendants) sro, >>> print next(iter(soup.find('a'))) sro,

BS4: Retrieving Text in a Tag

More articles: