, :
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $response = $ua->get("http://ru.wikipedia.org/wiki/Perl");
die $response->status_line unless $response->is_success;
my $content = $response->decoded_content;
my @russian = $content =~ /\s([\x{0400}-\x{052F}]+)\s/g;
print map { "$_\n" } @russian;
I believe that the set of Cyrillic characters begins with 0x0400, and the set of characters in Cyrillic complement ends with 0x052F, so this should get a lot of words.
source
share