Remove specific attributes from HTML tags

How to remove some attributes like id, style, class, etc. from HTML code?

I thought I could use lxml.html.clean module , but as it turned out, I can only remove style attributes with Clean(style=True).clean_html(code) , I would prefer not to use regular expressions for this task (attributes may vary).

What I would like:

 from lxml.html.clean import Cleaner code = '<tr id="ctl00_Content_AdManagementPreview_DetailView_divNova" class="Extended" style="display: none;">' cleaner = Cleaner(style=True, id=True, class=True) cleaned = cleaner.clean_html(code) print cleaned '<tr>' 

Thanks in advance!

+6
source share
1 answer

cleaner.Cleaner.__call__ has the safe_attrs_only parameter. If set to True , only attributes in clean.defs.safe_attrs . You can remove any or all attributes by changing clean.defs.safe_attrs . Just remember to change it when done.

 import lxml.html.clean as clean code = '<tr id="ctl00_Content_AdManagementPreview_DetailView_divNova" class="Extended" style="display: none;">' safe_attrs = clean.defs.safe_attrs cleaner = clean.Cleaner(safe_attrs_only=True, safe_attrs=frozenset()) cleansed = cleaner.clean_html(code) print(cleansed) 

gives

 <tr></tr> 
+10
source

Source: https://habr.com/ru/post/897605/


All Articles