Wide character error using utf8 pragma with HTML :: Laundry

I'm having problems with HTML::Laundry . The following snippet demonstrates what happens when using use utf8 or not. Enabling use utf8 results in an error:

 Wide character in subroutine entry at /usr/local/share/perl/5.14.2/HTML/Laundry.pm line 329 

Without use utf8 result is correct, but in the context of my program, I need the utf8 pragma.

 use utf8; use HTML::Laundry; use strict; my $snippet = "<p style=\"line-height: 18px; font-family: Verdana, Arial, Helvetica, sans-serif; color: rgb(153, 153, 153); margin: 0px; padding: 0px;\"><br>Sämtliche Produkte von collec entstehen in Zusammenarbeit mit Schweizer Werkstätten. collec setzt sich dafür ein, dass auch Menschen, die an geschützten Arbeitsplätzen tätig sind, hochwertige Produkte herstellen können. collec macht sich stark für die Erhaltung von Handarbeit und Handwerk, denn „Handwerk berührt das Denken."</p>"; my $clean = HTML::Laundry->new(); $clean->remove_acceptable_element(['font','span']); $clean->remove_acceptable_attribute(['class','style']); print $clean->clean($snippet); 

The program file itself is cleared by UTF-8

 file -i cleantest.pl cleantest.pl: text/plain; charset=utf-8 
+6
source share
1 answer

Peeking at source , it looks like HTML :: Laundry is initialized to HTML :: Parser with the utf8_mode flag utf8_mode . This flag causes HTML :: Parser to expect its input to be specified as an UTF-8 non-encoded stream of bytes, and not as a Unicode character stream.

You might want to write the error message / request function to HTML :: Laundry, asking for some way to handle Unicode input correctly. However, there is an obvious job: just code the input as UTF-8 before passing it to HTML :: Laundry:

 use Encode qw(encode_utf8); print $clean->clean(encode_utf8 $snippet); 

or

 utf8::encode($snippet); # encode to UTF-8 in place print $clean->clean($snippet); 
+4
source

Source: https://habr.com/ru/post/973403/


All Articles