You might want to take a look at the Strip-o-Gram conversion library: http://pypi.python.org/pypi/stripogram/1.5
usage example from readme.txt file:
from stripogram import html2text, html2safehtml mylumpofdodgyhtml # a lump of dodgy html ;-) # Only allow <b>, <a>, <i>, <br>, and <p> tags mylumpofcoolcleancollectedhtml = html2safehtml(mylumpofdodgyhtml,valid_tags=("b", "a", "i", "br", "p")) # Don't process <img> tags, just strip them out. Use an indent of 4 spaces # and a page that 80 characters wide. mylumpoftext = html2text(mylumpofcoolcleancollectedhtml,ignore_tags=("img",),indent_width=4,page_width=80)
twils source share