UTF-16 perl output

I am writing a script that inserts a UTF-16 encoded text file as input and outputs a UTF-16 encoded text file.

use open "encoding(UTF-16)"; open INPUT, "< input.txt" or die "cannot open > input.txt: $!\n"; open(OUTPUT,"> output.txt"); while(<INPUT>) { print OUTPUT "$_\n" } 

Say my program writes everything from input.txt to the file output.txt.

This WORKS works fine in my cygwin environment, which uses "This is perl 5, version 14, subversion 2 (v5.14.2), created for cygwin-thread-multi-64int"

But in my Windows environment that uses "This is perl 5, version 12, subversion 3 (v5.12.3), created for MSWin32-x64-multi-thread",

Each line in the output.txt file is preliminarily delayed by crazy characters except the first line.

For instance:

 <FIRST LINE OF TEXT> ਀    γˆ€  γ„€β°€ γˆ€β°€ ε˜€ζ„€ γŒ€ δŒ€ζ €ζ€€ζ„€ 䐀⸀⸀⸀  ε„€η”€ζ„€ζΈ€ζœ€ δ €ξ€€ΰ΄Š<SECOND LINE OF TEXT> ... 

Can anyone make it clear why it works on cygwin but not windows?

EDIT: after printing the encoded layers as suggested.

In Windows environment:

 unix crlf encoding(UTF-16) utf8 unix crlf encoding(UTF-16) utf8 

In a Cygwin environment:

 unix perlio encoding(UTF-16) utf8 unix perlio encoding(UTF-16) utf8 

The only difference between the perlio and crlf layer.

+4
source share
2 answers

[I was going to wait and give a detailed answer, but it is probably better if I give you a quick answer than nothing. ]

The problem is that the crlf and encoding layers are in the wrong order. Not your mistake.

For example, say you do print "a\nb\nc\n"; using UTF-16le (since it’s easier and maybe what you really want). As a result, you get

 61 00 0D 0A 00 62 00 0D 0A 00 63 00 0D 0A 00 

instead

 61 00 0D 00 0A 00 62 00 0D 00 0A 00 63 00 0D 00 0A 00 

I do not think that you can get the correct results with open pragma or with binmode , but this can be done with open .

 open(my $fh, '<:raw:encoding(UTF-16):crlf', $qfn) 

You need to add :utf8 with an older version, IIRC.

It works on cygwin because the crlf layer is only added on Windows. There you will get

 61 00 0A 00 62 00 0A 00 63 00 0A 00 
+3
source

You have a typo in your encoding. It should be use open ":encoding(UTF-16)" Note the colon. I do not know why this will work on Cygwin, but not on Windows, but it can also be 5.12 versus 5.14. Perl seems to compensate for this, but this may be the cause of your problem.

If this is not the case, check to see if the encoding applies to your file descriptors.

 print map { "$_\n" } PerlIO::get_layers(*INPUT); print map { "$_\n" } PerlIO::get_layers(*OUTPUT); 

Use lexical file descriptors (ie open my $fh, "<", $file ). Global Glob Descriptors are Global and, therefore, something else in your program may interfere with them.

If all this is verified, if lexical file descriptors receive an encoding(UTF-16) application encoding(UTF-16) , let us know and we can try something else.

UPDATE: This may provide your answer : " Specification ed UTF files are not suitable for stream models, and instead they should be divided into binary files." It looks like you should read the file as binary code and make the encoding as a string. Perhaps this was a mistake fixed in 5.14.

UPDATE 2: Yes, I can confirm that this is a bug that was fixed in 5.14 .

+4
source

Source: https://habr.com/ru/post/1442556/


All Articles