Comparing two Unicode strings with perl

Question

Comparing two Unicode strings with perl

When I run the following code, it does not enter the "do something here" section:

my $a ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; my $b ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; if ($a ne $b) { # do something here }

Is there any other way to compare Unicode strings with perl?

+4

perl unicode

smith Mar 05 '12 at 21:23

source share

1 answer

ikegami · Answer 1 · 2012-03-05T22:00:43+0000

If you have two Unicode lines (i.e. a Unicode code line), then you probably saved your file as UTF-8, and you really had

 use utf8; # Tell Perl source code is UTF-8. my $a = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; my $b = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª'; if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); }

And it works great. eq and ne will compare the point of the line code using the code point.

Some graphemes (for example, "& eacute;") can be created in several different ways, so you may need to normalize their representation first.

 use utf8; # Tell Perl source code is UTF-8. use charnames qw( :full ); # For \N{} use Unicode::Normalize qw( NFC ); my $a = NFC("\N{LATIN SMALL LETTER E WITH ACUTE}"); my $b = NFC("e\N{COMBINING ACUTE ACCENT}"); if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); }

Finally, Unicode believes that some characters are almost equivalent and can be considered equal using another form of normalization.

 use utf8; # Tell Perl source code is UTF-8. use charnames qw( :full ); # For \N{} use Unicode::Normalize qw( NFKC ); my $a = NFKC("2"); my $b = NFKC("\N{SUPERSCRIPT TWO}"); if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); }

Comparing two Unicode strings with perl

More articles: