Unicode in Perl not working

Question

Unicode in Perl not working

I have text files that I am trying to convert using a Perl script on Windows. Text files look fine in Notepad +, but all the regular expressions in my script did not match. Then I noticed that when I open text files in NotePad +, the status bar says "UCS-2 Little Endia" (sic). I assume this is consistent with the UCS-2LE encoding. Therefore, in Perl I created subtitles "readFile" and "writeFile":

use PerlIO::encoding;

my $enc = ':encoding(UCS-2LE)';

sub readFile {
    my ($fName) = @_;
    open my $f, "<$enc", $fName or die "can't read $fName\n";
    local $/;
    my $txt = <$f>;
    close $f;
    return $txt;
}

sub writeFile {
    my ($fName, $txt) = @_;
    open my $f, ">$enc", $fName or die "can't write $fName\n";
    print $f $txt;
    close $f;
}

my $fName = 'someFile.txt';

my $txt = readFile $fName;
# ... transform $txt using s/// ...
writeFile $fName, $txt;

Now the regular expressions match (although less often than I expect), but the output contains long lines of Asian characters, alternating with long lines of regular text. Is my code wrong? Or maybe Notepad + is wrong in coding? How do I proceed?

+3

perl unicode

Joelfan 22 . '10 0:05

2

Notepad +, , .

http://perldoc.perl.org/Encode/Unicode.html#Size%2c-Endianness%2c-and-BOM

, $txt, , .

+1

user181548 22 . '10 0:32

JoelFan · Accepted Answer · 2010-07-23T02:12:47+0000

, . , "encoding..." "" , CRLF , Perl Windows. , , -, , , LF CRLF , , "" 16- . , "" . " , "... .

, "" "binmode" :

open my $f, $fName or die "can't read $fName\n";
binmode $f, ':raw:encoding(UCS-2LE)';

binmode, -, "" -, .

, , - CRLF. : raw add: crlf, "" . .

( : CRLF- Unicode Perl)

Unicode in Perl not working

More articles: