How can I generate a list of words from a group of letters using Perl?

Question

How can I generate a list of words from a group of letters using Perl?

I was looking for a module, regex or something else that might be applicable to this problem.

How can I programmatically parse a string and create famous English & | Spanish words, given that I have a dictionary table against which I can check every permutation of the randomization algorithm for consistency?

Given a character group: EBLAIDL KDIOIDSI ADHFWB

The program should return: BLADE AID KID KIDS FIDDLE , etc.

I also want to be able to determine the minimum and maximum word length, as well as the number of syllables

The length of the input does not matter, it should be only letters, and punctuation does not matter.

Thanks for any help

EDIT
The letters in the input string can be reused.

For example, if the input is: ABLED , then the output may contain: BALL or BLEED

+4

string string-matching regex perl parsing

CheeseConQueso Feb 02 '12 at 0:12

source share

4 answers

ikegami · Answer 1 · 2012-02-02T02:00:40+0000

You did not specify, so I assume that each letter in the input can be used only once.

[Since the specified letters in the input can be used several times, but I'm going to leave this post here if someone finds it useful.]

The key to doing this effectively is sorting letters in words.

 abracadabra => AAAAABBCDRR abroad => AABDOR drab => ABDR

Then it becomes clear that "gray" is in "gibberish".

 abracadabra => AAAAABBCDRR drab => AB DR

And that "abroad" is not.

 abracadabra => AAAAABBCD RR abroad => AA B DOR

Call the sorted letter "signature". The word "B" in is in the word "A" if you can remove the letters from the signature "A" to get the signature "B". This is easy to verify with the regex pattern.

 sig('drab') =~ /^A?A?A?A?A?B?B?C?D?R?R?\z/

Or, if we eliminate unnecessary rollback for efficiency, we get

 sig('drab') =~ /^A?+A?+A?+A?+A?+B?+B?+C?+D?+R?+R?+\z/

Now that we know which template we want, it's just a matter of building it.

 use strict; use warnings; use feature qw( say ); sub sig { join '', sort grep /^\pL\z/, split //, uc $_[0] } my $key = shift(@ARGV); my $pat = sig($key); $pat =~ s/.\K/?+/sg; my $re = qr/^(?:$pat)\z/s; my $shortest = 9**9**9; my $longest = 0; my $count = 0; while (my $word = <>) { chomp($word); next if !length($word); # My dictionary starts with a blank line!! next if sig($word) !~ /$re/; say $word; ++$count; $shortest = length($word) if length($word) < $shortest; $longest = length($word) if length($word) > $longest; } say "Words: $count"; if ($count) { say "Shortest: $shortest"; say "Longest: $longest"; }

Example:

 $ perl script.pl EBLAIDL /usr/share/dict/words A Abe Abel Al ... libel lid lie lied Words: 117 Shortest: 1 Longest: 6

Wes hardaker · Answer 2 · 2012-02-02T00:47:33+0000

Well, regexp is pretty easy ... Then you just need to iterate over the words in the dictionary. EG, assuming standard linux:

 # perl -n -e 'print if (/^[EBLAIDL]+$/);' /usr/share/dict/words

It will quickly return all words in this file containing those and only those letters.

 A AA AAA AAAA AAAAAA AAAL AAE AAEE AAII AB ...

As you can see, you need a dictionary file that is worth having. In particular, / usr / share / dict / words on my Fedora system contains a bunch of words with everything that may or may not be what you want. Therefore, carefully select your dictionary.

For minimum lengths, you can also quickly get this:

 $min = 9999; $max = -1; while(<>) { if (/[EBLAIDL]+$/) { print; chomp; if (length($_) > $max) { $max = length($_); $maxword = $_; } if (length($_) < $min) { $min = length($_); $minword = $_; } } } print "longest: $maxword\n"; print "shortest: $minword\n";

Will produce:

 ZI ZMRI ZWEI longest: TANSTAAFL shortest: A

To break words into pieces and count syllables is very specific to the language, as mentioned in the comments above.

David W. · Answer 3 · 2012-02-02T03:25:37+0000

The only way I can imagine that this will work is to parse all possible combinations of letters and compare them with a dictionary. The quickest way to compare them with a dictionary is to turn this dictionary into a hash. This way you can quickly find if the word was a real word.

I will write down my dictionary with the lower case of all the letters in the dictionary, and then delete any non-alpha characters just to be safe. For the meaning, I will keep the actual vocabulary word. For instance:

 cant => "can't", google => "Google",

That way, I can display a correctly spelled word.

I found Math :: Combinatorics , which looked nice, but didn't quite work as I hoped. You will give him a list of letters, and he will return all combinations of these letters in the number of letters you specify. So I thought that all I had to do was translate the letters into a list of individual letters and just skip all the possible combinations!

No ... It gives me all the disordered combinations. What I then needed to do with each combination was to list all the possible permutations of these letters. Blah! Ptooy! Yech!

So, the infamous cycle in a cycle. Actually three loops. * The outer loop simply counts all the numbers of combinations from 1 to the number of letters in a word. * The following finds all unordered combinations of each of these letter groups. * Finally, the latter accepts all unordered combinations and returns a list of permutations from these combinations.

Now I can finally take these permutations of the letters and compare it with my dictionary of words. Surprisingly, the program worked much faster than I expected, believing that it should turn the dictionary of 235,886 into a hash dictionary, and then cycle through a triple deck loop to find all permutations of all combinations of the total possible number of letters. The entire program worked in less than two seconds.

 #! /usr/bin/env perl # use strict; use warnings; use feature qw(say); use autodie; use Data::Dumper; use Math::Combinatorics; use constant { LETTERS => "EBLAIDL", DICTIONARY => "/usr/share/dict/words", }; # # Create Dictionary Hash # open my $dict_fh, "<", DICTIONARY; my %dictionary; foreach my $word (<$dict_fh>) { chomp $word; (my $key = $word) =~ s/[^[:alpha:]]//; $dictionary{lc $key} = $word; } # # Now take the letters and create a Perl list of them. # my @letter_list = split // => LETTERS; my %valid_word_hash; # # Outer Loop: This is a range from one letter combinations to the # maximum letters combination # foreach my $num_of_letters (1..scalar @letter_list) { # # Now we generate a reference to a list of lists of all letter # combinations of $num_of_letters long. From there, we need to # take the Permutations of all those letters. # foreach my $letter_list_ref (combine($num_of_letters, @letter_list)) { my @letter_list = @{$letter_list_ref}; # For each combination of letters $num_of_letters long, # we now generate a permeation of all of those letter # combinations. # foreach my $word_letters_ref (permute(@letter_list)) { my $word = join "" => @{$word_letters_ref}; # # This $word is just a possible candidate for a word. # We now have to compare it to the words in the dictionary # to verify it a word # $word = lc $word; if (exists $dictionary{$word}) { my $dictionary_word = $dictionary{$word}; $valid_word_hash{$word} = $dictionary_word; } } } } # # I got lazy here... Just dumping out the list of actual words. # You need to go through this list to find your longest and # shortest words. Number of syllables? That trickier, you could # see if you can divide on CVC and CVVC divides where C = consonant # and V = vowel. # say join "\n", sort keys %valid_word_hash;

Running this program:

 $ ./test.pl | column a al balei bile del i lai ab alb bali bill delia iba laid abdiel albe ball billa dell ibad lea abe albi balled billed della id lead abed ale balli blad di ida leal abel alible be blade dial ide led abide all bea blae dib idea leda abie alle bead d die ideal lei able allie beal da dieb idle leila ad allied bed dab dill ie lelia ade b beid dae e ila li adib ba bel dail ea ill liable adiel bad bela dal ed l libel ae bade beld dale el la lid ai bae belial dali elb lab lida aid bail bell dalle eld label lide aide bal bella de eli labile lie aiel bald bid deal elia lad lied ail baldie bide deb ell lade lila aile bale bield debi ella ladle lile

Rafael · Answer 4 · 2012-02-04T15:50:33+0000

Perhaps this will help if you create a separate table with 26 letters of the alphabet. Then you will create a query that will search in the second database for any letter that you define. It is important that the query ensures that each result is unique.

So, you have a table containing your words, and you have many to many relationships with another table that contains all the letters of the alphabets. And you will query this second table and make the results unique. You may have a similar approach to the number of letters.

You can use the same approach for the number of letters and syllables. Thus, you would make one request that combines all the necessary information. Put the correct indexes in the database to help performance, use appropriate caching, and if that happens, you can parallelize the search.

How can I generate a list of words from a group of letters using Perl?

More articles: