I am trying to "defrontpagify" the html of the created MS FrontPage website and I am writing a BeautifulSoup script to do this.
However, I am stuck in the part where I am trying to remove a specific attribute (or list attributes) from each tag in the document that contains them. Code snippet:
REMOVE_ATTRIBUTES = ['lang','language','onmouseover','onmouseout','script','style','font', 'dir','face','size','color','style','class','width','height','hspace', 'border','valign','align','background','bgcolor','text','link','vlink', 'alink','cellpadding','cellspacing']
It works without errors, but actually does not break any attributes. When I run it without an external loop, just hard-coded a single attribute (soup.findAll ('style' = True), it works.
Does anyone see a problem here?
PS - I also don't like nested loops. If someone knows a more functional map / filter-ish style, I would love to see it.