I have text files that I am trying to convert using a Perl script on Windows. Text files look fine in Notepad +, but all the regular expressions in my script did not match. Then I noticed that when I open text files in NotePad +, the status bar says "UCS-2 Little Endia" (sic). I assume this is consistent with the UCS-2LE encoding. Therefore, in Perl I created subtitles "readFile" and "writeFile":
use PerlIO::encoding;
my $enc = ':encoding(UCS-2LE)';
sub readFile {
my ($fName) = @_;
open my $f, "<$enc", $fName or die "can't read $fName\n";
local $/;
my $txt = <$f>;
close $f;
return $txt;
}
sub writeFile {
my ($fName, $txt) = @_;
open my $f, ">$enc", $fName or die "can't write $fName\n";
print $f $txt;
close $f;
}
my $fName = 'someFile.txt';
my $txt = readFile $fName;
writeFile $fName, $txt;
Now the regular expressions match (although less often than I expect), but the output contains long lines of Asian characters, alternating with long lines of regular text. Is my code wrong? Or maybe Notepad + is wrong in coding? How do I proceed?