Let's say I have this code:
use strict;
use LWP qw ( get );
my $content = get ( "http://www.msn.co.il" );
print STDERR $content;
The error log shows something like "\ xd7 \ x9c \ xd7 \ x94 \ xd7 \ x93 \ xd7 \ xa4 \ xd7 \ xa1 \ xd7 \ x94" which I guess is utf-16?
Website coding with
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1255">
So why do these characters appear and not windows-1255 characters?
And another oddity is that I have two servers:
the first server returning CP1255 characters, and I can just convert it to utf8, and the current server gives me these characters, and I can do nothing about it ...
is there any configuration file in apache / perl / module that messed up the encoding? force something ...?
- , perl - utf8, , , , ( weird utf chars), : "×× ¡'××× ¨ ××:"
, , - ...
perl:
my $content = `curl "http://www.anglo-saxon.co.il"`;
utf8.
Bash:
curl "http://www.anglo-saxon.co.il"
CP1255 (Windows-1255)...
,
script bash - CP1255, web - utf8...
, utf8 - , , utf8:
use Text::Iconv;
my $converter = Text::Iconv->new("utf8", "CP1255");
$content=$converter->convert($content);
my $converter = Text::Iconv->new("CP1255", "utf8");
$content=$converter->convert($content);