How to remove some attributes like id, style, class, etc. from HTML code?
I thought I could use lxml.html.clean module , but as it turned out, I can only remove style attributes with Clean(style=True).clean_html(code)
, I would prefer not to use regular expressions for this task (attributes may vary).
What I would like:
from lxml.html.clean import Cleaner code = '<tr id="ctl00_Content_AdManagementPreview_DetailView_divNova" class="Extended" style="display: none;">' cleaner = Cleaner(style=True, id=True, class=True) cleaned = cleaner.clean_html(code) print cleaned '<tr>'
Thanks in advance!
source share