sro, small I want to ...">

BS4: Retrieving Text in a Tag

I use beautiful soup. There is such a tag:

<li><a href="example"> sro, <small>small</small></a></li>

I want to get text only inside the <a> anchor <a> , without the <small> in the output; i.e. " sro, "

I tried find('li').text[0] but it does not work.

Is there a team in BS4 that can do this?

+12
source share
3 answers

One option is to get the first element from the contents of a :

 >>> from bs4 import BeautifulSoup >>> data = '<li><a href="example"> sro, <small>small</small></a></li>' >>> soup = BeautifulSoup(data) >>> print soup.find('a').contents[0] sro, 

Another would be to find the small tag and get the previous sibling :

 >>> print soup.find('small').previous_sibling sro, 

Well, there are all sorts of alternative / crazy options:

 >>> print next(soup.find('a').descendants) sro, >>> print next(iter(soup.find('a'))) sro, 
+16
source

Use . children

 soup.find('a').children.next() sro, 
+2
source

If you want to run a loop to print the entire contents of the anchor tags located in the html line / web page (you must use urlopen from urllib), this works:

 from bs4 import BeautifulSoup data = '<li><a href="example">sro, <small>small</small</a></li> <li><a href="example">2nd</a></li> <li><a href="example">3rd</a></li>' soup = BeautifulSoup(data,'html.parser') a_tag=soup('a') for tag in a_tag: print(tag.contents[0]) #.contents method to locate text within <a> tags 

Output:

 sro, 2nd 3rd 

a_tag - a list containing all anchor tags; collecting all the anchor tags in the list allows editing groups (if there is more than one <a> tag.

 >>>print(a_tag) [<a href="example">sro, <small>small</small></a>, <a href="example">2nd</a>, <a href="example">3rd</a>] 
0
source

Source: https://habr.com/ru/post/973663/


All Articles