Python Beautifulsoup get_text () doesn't get all text

Question

Python Beautifulsoup get_text () doesn't get all text

I am trying to get all the text from an html tag using the beautifulsoup get_text () method. I am using Python 2.7 and Beautifulsoup 4.4.0. He works most of the time. However, this method can only get the first paragraph from the tag. I can’t understand why. See the following example.

from bs4 import BeautifulSoup
import urllib2

job_url = "http://www.indeed.com/viewjob?jk=0f5592c8191a21af"
site = urllib2.urlopen(job_url).read()
soup = BeautifulSoup(site, "html.parser")
text = soup.find("span", {"class": "summary"}).get_text()
print text

I want to get all the content from this job description. Basically, I want to get all the text. However, use the code above, I can only get "Please note that this is a 1-year contract. Candidates cannot start the task until the background check and drug testing are complete." Why am I losing the rest of the text? How can I get all the text from this tag without specifying a subtag?

Many thanks.

+4

python html python-2.7 urllib2 beautifulsoup

Shengjie Zhang 19 . '15 17:00

1

Joe Young · Accepted Answer · 2015-09-19T17:21:09+0000

, lxml html.parser:

:

soup = BeautifulSoup(site, "html.parser")

:

soup = BeautifulSoup(site, "lxml")

, lxml : http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

Python Beautifulsoup get_text () doesn't get all text

More articles: