Convert file to UTF8 format using Perl

How to convert a file to its utf-8 format using Perl? and how to check if the converted file is in utf-8 format?

+4
source share
4 answers

Bindings to the iconv library, such as Text::Iconv , is not required because Perl already has its own character encoding library: Encode . Its part is piconv , iconv(1) workalike. Use it to batch convert files to UTF-8. Is ANSI just a dumb name for the windows-125? group windows-125? encodings. Most likely, files encoded in windows-1252 . Example:

 piconv -f windows-1252 -t UTF-8 < input-file > output-file 

If metadata is missing, heuristics should be used to determine the encoding of the file contents. I recommend Encode::Detect .

+3
source

To do the conversion, see Text :: Iconv

  use Text::Iconv; $converter = Text::Iconv->new("fromcode", "tocode"); $converted = $converter->convert("Text to convert"); 
+1
source

which depends on the line you received. if the file was uploaded - I think this code will help. but if it is text from web text that converts itself to utf-8 (because you are working with utf-8), then you will have a problem finding it out.

I usually use:

use Encoding :: Guess

my $ enc = guess_encoding ($ string);

and then with the code above, I do:

use text :: Iconv;
$ converter = Text :: Iconv-> new ($ enc, "utf-8");
$ convert = $ converter-> convert ("Text to convert");

The list of FYI utf-8 can be found here:

http://www.fileformat.info/info/charset/UTF-8/list.htm?start=1024

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=1024&number=1024&utf8=string-literal&unicodeinhtml=dec

+1
source

using the Encode module you can easily encode in different encodings

eg,

 my $str = "A string in Perl internal format ...."; my $octets = encode("utf-8",$str,Encode::FB_CROAK); 

to check utf you can use the function

 is_utf8($str,Encode::FB_CROAK) 
+1
source

Source: https://habr.com/ru/post/1309290/


All Articles