<\/script>')

Comparing two Unicode strings with perl

When I run the following code, it does not enter the "do something here" section:

my $a ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; my $b ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; if ($a ne $b) { # do something here } 

Is there any other way to compare Unicode strings with perl?

+4
source share
1 answer

If you have two Unicode lines (i.e. a Unicode code line), then you probably saved your file as UTF-8, and you really had

 use utf8; # Tell Perl source code is UTF-8. my $a = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; my $b = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); } 

And it works great. eq and ne will compare the point of the line code using the code point.

Some graphemes (for example, "& eacute;") can be created in several different ways, so you may need to normalize their representation first.

 use utf8; # Tell Perl source code is UTF-8. use charnames qw( :full ); # For \N{} use Unicode::Normalize qw( NFC ); my $a = NFC("\N{LATIN SMALL LETTER E WITH ACUTE}"); my $b = NFC("e\N{COMBINING ACUTE ACCENT}"); if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); } 

Finally, Unicode believes that some characters are almost equivalent and can be considered equal using another form of normalization.

 use utf8; # Tell Perl source code is UTF-8. use charnames qw( :full ); # For \N{} use Unicode::Normalize qw( NFKC ); my $a = NFKC("2"); my $b = NFKC("\N{SUPERSCRIPT TWO}"); if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); } 
+13
source

Source: https://habr.com/ru/post/1399879/


All Articles