Need help in HTML :: TagFilter, getting rid of the contents of a style tag

I have a filter written in perl that looks like this:

my $tf = HTML::TagFilter->new( allow => { img => { src => [] }, b => { all => [] }, i => { all => [] }, em => { all => [] }, u => { all => [] }, s => { all => [] }, }, strip_comments => 1, skip_xss_protection => 1, ); 

now when i pass html like this

 <html> <head> <style><!-- ..hmmessage P { margin:0px=3B padding:0px } body.hmmessage { font-size: 12pt=3B font-family:Calibri } --></style></head> <body class=3D'hmmessage'><div dir=3D'ltr'>Message content here! = </div></body> </html> 

output

 <!--..hmmessage P{margin:0px;padding:0px}body.hmmessage{font-size: 12pt;font-family:Calibri}-->Message content here 

If you look at the result, you will find that the content of the style tag still exists, I don’t know why ?, so can someone tell me why the content of the style tag still exists after passing through the filter?

+4
source share
1 answer

This is the undocumented HTML::TagFilter "function" that is the result of HTML::Parser subwashing. The latter interprets the contents of the <style> and <script> as CDATA and parses them by default, ignoring the allowed and forbidden tags:

Script and style tags will always be embedded correctly, as their contents are parsed in CDATA mode.

Source

To solve this problem just call

 $tf->ignore_elements('style'); 

before calling the analysis method on your HTML - it will ignore the style tag and do what you want. Note that in your code example, if you substitute style in foo , no comment will be printed.

+1
source

Source: https://habr.com/ru/post/1494239/


All Articles