Unicode Tweet Compressor in Perl

I want to implement my own tweet compressor . This basically does the following. However, I am stuck with some unicode issues.

Here is my script:

#!/usr/bin/env perl
use warnings;
use strict;

print tweet_compress('cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, "\. " ,", "'),"\n";

sub tweet_compress {
    my $tweet = shift;
    $tweet =~ s/\. ?$//;
    my @orig = ( qw/cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, ". " ,", ");
    my @new = qw/㏄ ㎳ ㎱ ㎰ ㏌ ʪ fi fl ffl ffi ⅳ ⅸ ⅵ ѹ ⅱ ⅺ nj . ,/;
    $tweet =~ s/$orig[$_]/$new[$_]/g for 0 .. $#orig;
    return $tweet;
}

But this prints trash on the terminal:

?.?.?.?.?.?.?.f.?.f?.?.?.?.?.?.?.nj/."\..,""

What am I doing wrong?

+3
source share
2 answers

Two problems.

First, you have Unicode characters in the source code. Make sure you save the file as utf8 and use the utf8 pragma.

, , , unicode. Windows ? , . Mac OS Terminal set utf8.

-, "." , " " , , . , .

#!/usr/bin/env perl
use warnings;
use strict;
use utf8; #use character semantics

#make sure the data is re-encoded to utf8 when output to terminal
binmode STDOUT, ':utf8';

print tweet_compress('cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, "\. " ,", "'),"\n";

sub tweet_compress {
    my $tweet = shift;
    $tweet =~ s/\. ?$//;
    my @orig = ( qw/cc ms ns ps in ls fi fl ffl ffi iv ix vi oy ii xi nj/, '\. ' ,", ");
    my @new = qw/㏄ ㎳ ㎱ ㎰ ㏌ ʪ fi fl ffl ffi ⅳ ⅸ ⅵ ѹ ⅱ ⅺ nj . ,/;
    $tweet =~ s/$orig[$_]/$new[$_]/g for 0 .. $#orig;
    return $tweet;
}
+6

perl, Unicode script use utf8.

+1

Source: https://habr.com/ru/post/1755438/


All Articles