Perl Unicode test on OS X does not work on Debian

I have the following test:

use Test::More; use Lingua::EN::NameCase 'nc'; use utf8; my $output = Test::Builder->new->todo_output; binmode $output, ':encoding(UTF-8)'; $output = Test::Builder->new->failure_output; binmode $output, ':encoding(UTF-8)'; my $name = 'Lintão'; is nc($name), $name, 'nc() should not change a properly namecased name'; diag nc($name); done_testing; 

On Mac OS X with Perl 5.10.1, I get the following output:

 nc.t .. ok 1 - nc() should not change a properly namecased name 1..1 # Lintão ok All tests successful. Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.01 sys + 0.04 cusr 0.00 csys = 0.07 CPU) Result: PASS 

Unfortunately, the same test on Debian Squeezebox with 5.10.1 Perl produces this output:

 nc.t .. not ok 1 - nc() should not change a properly namecased name # Failed test 'nc() should not change a properly namecased name' # at nc.t line 10. # got: 'LintãO' # expected: 'Lintão' # LintãO 1..1 # Looks like you failed 1 test of 1. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/1 subtests Test Summary Report ------------------- nc.t (Wstat: 256 Tests: 1 Failed: 1) Failed test: 1 Non-zero exit status: 1 Files=1, Tests=1, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.03 cusr 0.00 csys = 0.04 CPU) Result: FAIL 

The insult line in the nc() routine is as follows:

 s{ \b (\w) }{\u$1}gox ; # Uppercase first letter of every word. 

So, anyway, the same version of Perl on Debian is misusing the word boundary. Can someone help me debug further?

+4
source share
1 answer

The locale on your Linux box does not consider ã the word character ( Lingua::EN::NameCase has use locale; Lingua::EN::NameCase therefore, it uses the current LC_CTYPE parameter to classify the characters). With perlbrewed perls from 5.8.1 to 5.18.1, I get this output sequentially on Ubuntu 12.04 LTS with en_GB.UTF-8 locale:

 $ perl -Mutf8 -le 'print 0+("ã" =~ /\w/); use locale; print 0+("ã" =~ /\w/)' 1 0 
+6
source

Source: https://habr.com/ru/post/956061/


All Articles