Question about the utf-8 order

#!/usr/bin/env perl
use warnings;
use 5.012;
use Encode qw(encode);

no warnings qw(utf8);

my $c = "\x{ffff}";

my $utf_8 = encode( 'utf-8', $c );
my $utf8 = encode( 'utf8', $c );

say "utf-8 :  @{[ unpack '(B8)*', $utf_8 ]}";
say "utf8  :  @{[ unpack '(B8)*', $utf8 ]}";

# utf-8 :  11101111 10111111 10111101
# utf8  :  11101111 10111111 10111111

Does this "utf-8" write this path to automatically correct my code point to the last interchangeable code point (first plane)?

+3
source share
1 answer

See UTF-8 vs. utf8 vs. UTF8 Encode section .

To summarize, Perl has two different UTF-8 encodings. Its native encoding is called utf8and basically allows any code, regardless of what the Unicode standard says about this code.

Another encoding is called utf-8(aka utf-8-strict). This allows you to use only code pages assigned by the Unicode standard.

\x{FFFF} Unicode. Perl utf8 .

encode , , (. ). utf-8 U + FFFD ( ), UTF-8 11101111 10111111 10111101 ().

+7

Source: https://habr.com/ru/post/1795563/


All Articles