Perl convert file descriptor in place / streaming from cp1252 to utf-8?

I have a file descriptor open in a file with cp1252 characters. I want to give this open file descriptor a library that expects raw utf8 bytes, and am going to send them over the network.

The naive way to do this is to write the file to the second file with the correct encoding and transfer the second file to the library:

use Fcntl qw/SEEK_SET/; open my $fh_1252, "<:encoding(cp1252)", "1252.txt" || die $!; open my $fh_utf8, "+>:encoding(utf8)", "utf8.txt" || die $!; while (<$fh_1252>){ print $fh_utf8 $_ }; seek($fh_utf8, 0, SEEK_SET); # now give $fh_utf8 to the library for transmission 

This seems like a bunch of extra work. Is there a way to just sink it? I know that I could use IO :: Scalar to remove the need to write to disk, but I still have to read all this in memory. There seems to be a way to convey it with a pipeline, but I don't think about how to do it right now.

+6
source share
2 answers

You can write your own conversion module for PerlIO and use it with :via(MODULE) . Your module can transfer data via Text::Iconv to convert from one encoding to another.

This method is described in the PerlIO::via(3pm) manual. In short, you will need to create your own module, for example. PerlIO::via::Example , i.e. you create the PerlIO/via directory and put Example.pm there with the following contents:

 package PerlIO::via::Example; use strict; use warnings; use Text::Iconv; my $converter = Text::Iconv->new("windows-1252", "utf-8"); sub PUSHED { my ($class, $mode, $fh) = @_; # When writing we buffer the data my $buf = ''; return bless \$buf, $class; } sub FILL { my ($obj, $fh) = @_; my $line = <$fh>; return (defined $line) ? 'converted: ' . $converter->convert($line) : undef; # 'converted: ' is added here for debugging purposes } sub WRITE { my ($obj,$buf,$fh) = @_; $$obj .= $buf; # we do nothing here return length($buf); } sub FLUSH { my ($obj, $fh) = @_; print $fh $$obj or return -1; $$obj = ''; return 0; } 1; 

and then use it in open , like here:

 use strict; use warnings; use PerlIO::via::Example; open(my $fh, "<:via(Example)", "input.txt"); while (<$fh>) { print; } close $fh; 
+3
source

You can use an external program to convert the input file. See perldoc -f open more details.

 open( my $ft, '-|' "iconf -f CP1252 -t UTF-8 1252.txt") || die $!; 

PS There are simpler solutions for perl libraries. The above is the most common IMHO.

0
source

Source: https://habr.com/ru/post/976220/


All Articles