How to save a web page as a text file [Python]

Question

How to save a web page as a text file [Python]

I would like to save the web page (all content) as a text file. (As if you made a right click on a web page -> "Save Page As" -> "Save As Text File", and not as an html file)

I tried using the following code:

import urllib2
url=''
page = urllib2.urlopen(url)
page_content = page.read()
file = open('file_text.txt', 'w')
f.write(page_content)
f.close()

My goal is to save all text without html code. (for example, I would like to read "è" "& eacute" instead)

+4

python text web save

Skipper Feb 03 '16 at 0:03

source share

1 answer

pnovotnak · Answer 1 · 2016-02-03T00:08:36+0000

See html2text as mentioned elsewhere

import urllib2
import html2text
url=''
page = urllib2.urlopen(url)
html_content = page.read()
rendered_content = html2text.html2text(html_content)
file = open('file_text.txt', 'w')
file.write(rendered_content)
file.close()

How to save a web page as a text file [Python]

More articles: