Why does Perl LWP give me a different encoding than the original website?

Let's say I have this code:

use strict;
use LWP qw ( get );

my $content = get ( "http://www.msn.co.il" );

print STDERR $content;

The error log shows something like "\ xd7 \ x9c \ xd7 \ x94 \ xd7 \ x93 \ xd7 \ xa4 \ xd7 \ xa1 \ xd7 \ x94" which I guess is utf-16?

Website coding with

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1255">

So why do these characters appear and not windows-1255 characters?

And another oddity is that I have two servers:

the first server returning CP1255 characters, and I can just convert it to utf8, and the current server gives me these characters, and I can do nothing about it ...

is there any configuration file in apache / perl / module that messed up the encoding? force something ...?

- , perl - utf8, , , , ( weird utf chars), : "×× ¡'××× ¨ ××:"

, , - ...

perl:

my $content = `curl "http://www.anglo-saxon.co.il"`;    

utf8.

Bash:

curl "http://www.anglo-saxon.co.il"

CP1255 (Windows-1255)...

, script bash - CP1255, web - utf8...

​​, utf8 - , , utf8:

use Text::Iconv;

my $converter = Text::Iconv->new("utf8", "CP1255");
   $content=$converter->convert($content);

my $converter = Text::Iconv->new("CP1255", "utf8");
   $content=$converter->convert($content);
+3
4

, , UTF-8. , Perl ' UTF-8, . LWP::Simple->get() , Content-Encoding, UTF-8.

, (. HTTP:: Message decoded_content, HTTP:: Response decoded_content, LWP:: UserAgent ). -

use Encode; 
...; 
$cp1255_bytes = encode('CP1255', decode('UTF_8', $utf8_bytes));

/ , , . , UTF-8, CP1255. , CP1255, CP1255, UTF-8 UTF-8- . , .

+2

. HTML , , windows-1255; , UTF-8, . HTML- Microsoft.

, , :

my $response = LWP::UserAgent->new->get("http://www.msn.co.il/");
my $content = $res->decoded_content;

$content perl, , . - , Encode::encode ; Encode::decode, .

+8

http://www.msn.co.il UTF-8 , . "\ xd7\x9c\xd7\x94\xd7\x93\xd7\xa4\xd7\xa1\xd7\x94" UTF-8 (להדפסה). .

, , (UTF-8 Windows-1252). , / .

+5

First, note that you must import getfrom LWP :: Simple . Secondly, everything works fine:

#!/usr/bin/perl
use strict; use warnings;
use LWP::Simple qw ( getstore );
getstore 'http://www.msn.co.il', 'test.html';

which tells me that the problem is encoding the file descriptor to which you send the output.

+3
source

Source: https://habr.com/ru/post/1734537/


All Articles