Find the word with most letters along with other words

I want Perl (5.8.8) to figure out which word has the majority of the letters along with the other words in the array - but only the letters that are in the same place. (And preferably without using libs.)

Take this list of words as an example:

  • Baker
  • Mototechnics
  • Baler
  • CARER
  • RUFFR

Her BALER is a word that has most in common with others. It corresponds to BAxER in BAKER, xALER in SALER, xAxER in CARER and xxxxR in RUFFR.

I want Perl to find this word for me in an arbitrary list of words with the same length and case. It seems I hit the wall here, so help is very much appreciated!

What I tried so far

At the moment, there is actually not a lot of script:

use strict; use warnings; my @wordlist = qw(BAKER SALER MALER BARER RUFFR); foreach my $word (@wordlist) { my @letters = split(//, $word); # now trip trough each iteration and work magic... } 

If there is a comment, I tried several kinds of code, heavy with for-loops and ++ varables. So far, none of my attempts have done what I need.

So, to better explain: I need to check word for word against the list, for each letter position, find the word that has the majority of the letters along with the rest in the list, the position in this letter.

One possible way might be to first check which word (s) has the most common meaning in the position of the letter 0, then check the letter position 1, etc., until you find the word that has the most letters in total along with other words on the list. Then I would like to print the list as a matrix with grades for each letter position plus a total score for each word, unlike what DavidO offers.

As a result, you will receive a matrix for each word, with an estimate for each position of the letter and a total score for each word in the matrix.

Program Purpose

Hehe, I could also say this: the program is designed to hack the terminals in the game Fallout 3 .: D I think this is a great way to learn Perl, as well as have fun playing games.

Here is one of the Fallout 3 terminal hacking tutorials that I used for research: FALLOUT 3: Hacking FAQ v1.2 , you already made a program to shorten the list of words, for example:

 #!/usr/bin/perl # See if one word has equal letters as the other, and how many of them are equal use strict; use warnings; my $checkword = "APPRECIATION"; # the word to be checked my $match = 4; # equal to the match you got from testing your checkword my @checkletters = split(//, $checkword); #/ my @wordlist = qw( PARTNERSHIPS REPRIMANDING CIVILIZATION APPRECIATION CONVERSATION CIRCUMSTANCE PURIFICATION SECLUSIONIST CONSTRUCTION DISAPPEARING TRANSMISSION APPREHENSIVE ENCOUNTERING ); print "$checkword has $match letters in common with:\n"; foreach my $word (@wordlist) { next if $word eq $checkword; my @letters = split(//, $word); my $length = @letters; # determine length of array (how many letters to check) my $eq_letters = 0; # reset to 0 for every new word to be tested for (my $i = 0; $i < $length; $i++) { if ($letters[$i] eq $checkletters[$i]) { $eq_letters++; } } if ($eq_letters == $match) { print "$word\n"; } } # Now to make a script on to find the best word to check in the first place... 

This script will give CONSTRUCTION and TRANSMISSION as a result, just like in the game FAQ. However, the trick with the original question (and what I couldn’t figure out on my own) is how to find the best word to try first, i.e. APPRECIATION .

OK, I have now provided my own solution based on your help and closed this topic. Many, many thanks to all contributors. You helped a lot, and along the way I also learned a lot.: D

+6
source share
8 answers

As a starting point, you can effectively check how many letters they have:

 $count = ($word1 ^ $word2) =~ y/\0//; 

But this is only useful if you loop all possible pairs of words, which is not necessary in this case:

 use strict; use warnings; my @words = qw/ BAKER SALER BALER CARER RUFFR /; # you want a hash to indicate which letters are present how many times in each position: my %count; for my $word (@words) { my @letters = split //, $word; $count{$_}{ $letters[$_] }++ for 0..$#letters; } # then for any given word, you get the count for each of its letters minus one (because the word itself is included in the count), and see if it is a maximum (so far) for any position or for the total: my %max_common_letters_count; my %max_common_letters_words; for my $word (@words) { my @letters = split //, $word; my $total; for my $position (0..$#letters, 'total') { my $count; if ( $position eq 'total' ) { $count = $total; } else { $count = $count{$position}{ $letters[$position] } - 1; $total += $count; } if ( ! $max_common_letters_count{$position} || $count >= $max_common_letters_count{$position} ) { if ( $max_common_letters_count{$position} && $count == $max_common_letters_count{$position} ) { push @{ $max_common_letters_words{$position} }, $word; } else { $max_common_letters_count{$position} = $count; $max_common_letters_words{$position} = [ $word ]; } } } } # then show the maximum words for each position and in total: for my $position ( sort { $a <=> $b } grep $_ ne 'total', keys %max_common_letters_count ) { printf( "Position %s had a maximum of common letters of %s in words: %s\n", $position, $max_common_letters_count{$position}, join(', ', @{ $max_common_letters_words{$position} }) ); } printf( "The maximum total common letters was %s in words(s): %s\n", $max_common_letters_count{'total'}, join(', ', @{ $max_common_letters_words{'total'} }) ); 
+5
source

Here is one way. After reading your spec a couple of times, I think this is what you are looking for.

It is worth noting that it is possible that there will be more than one word with an equal upper score. There is only one winner from your list, but it is possible that a longer list will have several equally winning words. This decision concerns this. In addition, as I understand it, you consider letters to match only if they occur in the same column per word. If so, then here is a working solution:

 use 5.012; use strict; use warnings; use List::Util 'max'; my @words = qw/ BAKER SALER BALER CARER RUFFR /; my @scores; foreach my $word ( @words ) { my $score; foreach my $comp_word ( @words ) { next if $comp_word eq $word; foreach my $pos ( 0 .. ( length $word ) - 1 ) { $score++ if substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1); } } push @scores, $score; } my $max = max( @scores ); my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores; say "Words with most matches:"; say for @words[@max_ixs]; 

This decision counts how many times for each column of letters each word matches other words. For example:

 Words: Scores: Because: ABC 1, 2, 1 = 4 A matched once, B matched twice, C matched once. ABD 1, 2, 1 = 4 A matched once, B matched twice, D matched once. CBD 0, 2, 1 = 3 C never matched, B matched twice, D matched once. BAC 0, 0, 1 = 1 B never matched, A never matched, C matched once. 

This gives you the winners of ABC and ABD, each with a score of four positional matches. That is, cumulative moments when one column is one, row one corresponds to column one row two, three and four, etc. For subsequent columns. It may be able to be optimized and reformulated to be shorter, but I tried to make the logic pretty easy to read. Enjoy it!

UPDATE / EDIT I thought about this and realized that although my existing method does exactly what your original question asked, it did it O (n ^ 2) times, which is relatively slow. But if we use hash keys for each letter of the column (one letter per key) and count how many times each letter appears in the column (as the value of the hash element), we could do our sums in O (1) and our traversal of the list in O (n * c) time (where c is the number of columns, n is the number of words). There is also setup time (creating a hash). But we still have big improvements. Here is a new version of each technique, as well as a comparative comparison of each of them.

 use strict; use warnings; use List::Util qw/ max sum /; use Benchmark qw/ cmpthese /; my @words = qw/ PARTNERSHIPS REPRIMANDING CIVILIZATION APPRECIATION CONVERSATION CIRCUMSTANCE PURIFICATION SECLUSIONIST CONSTRUCTION DISAPPEARING TRANSMISSION APPREHENSIVE ENCOUNTERING /; # Just a test run for each solution. my( $top, $indexes_ref ); ($top, $indexes_ref ) = find_top_matches_force( \@words ); print "Testing force method: $top matches.\n"; print "@words[@$indexes_ref]\n"; ( $top, $indexes_ref ) = find_top_matches_hash( \@words ); print "Testing hash method: $top matches.\n"; print "@words[@$indexes_ref]\n"; my $count = 20000; cmpthese( $count, { 'Hash' => sub{ find_top_matches_hash( \@words ); }, 'Force' => sub{ find_top_matches_force( \@words ); }, } ); sub find_top_matches_hash { my $words = shift; my @scores; my $columns; my $max_col = max( map { length $_ } @{$words} ) - 1; foreach my $col_idx ( 0 .. $max_col ) { $columns->[$col_idx]{ substr $_, $col_idx, 1 }++ for @{$words}; } foreach my $word ( @{$words} ) { my $score = sum( map{ $columns->[$_]{ substr $word, $_, 1 } - 1 } 0 .. $max_col ); push @scores, $score; } my $max = max( @scores ); my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores; return( $max, \@max_ixs ); } sub find_top_matches_force { my $words = shift; my @scores; foreach my $word ( @{$words} ) { my $score; foreach my $comp_word ( @{$words} ) { next if $comp_word eq $word; foreach my $pos ( 0 .. ( length $word ) - 1 ) { $score++ if substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1); } } push @scores, $score; } my $max = max( @scores ); my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores; return( $max, \@max_ixs ); } 

Conclusion:

 Testing force method: 39 matches. APPRECIATION Testing hash method: 39 matches. APPRECIATION Rate Force Hash Force 2358/s -- -74% Hash 9132/s 287% -- 

I understand that your initial specification changed after you saw some other options, and the nature of innovation to a certain extent, but the puzzle is still alive in my mind. As you can see, my hash method is 287% faster than the original method. More fun in less time!

+7
source

Here is the full script. It uses the same idea as yht (although I had it independently). Use bitwise xor to concatenate strings, and then count the number of NULs in the result. As long as your strings are ASCII, this will tell you how many matching letters were. (This comparison is case sensitive, and I'm not sure what will happen if the lines are UTF-8. Probably nothing good.)

 use strict; use warnings; use 5.010; use List::Util qw(max); sub findMatches { my ($words) = @_; # Compare each word to every other word: my @matches = (0) x @$words; for my $i (0 .. $#$words-1) { for my $j ($i+1 .. $#$words) { my $m = ($words->[$i] ^ $words->[$j]) =~ tr/\0//; $matches[$i] += $m; $matches[$j] += $m; } } # Find how many matches in the best word: my $max = max(@matches); # Find the words with that many matches: my @wanted = grep { $matches[$_] == $max } 0 .. $#matches; wantarray ? @$words[@wanted] : $words->[$wanted[0]]; } # end findMatches my @words = qw( BAKER SALER BALER CARER RUFFR ); say for findMatches(\@words); 
+4
source

Don't touch perl after a while, so this is pseudo-code. This is not the fastest algorithm, but it will work just fine for a few words.

 totals = new map #eg an object to map :key => :value for each word a for each word b next if a equals b totals[a] = 0 for i from 1 to a.length if a[i] == b[i] totals[a] += 1 end end end end return totals.sort_by_key.last 

Sorry for the lack of perl, but if you point it to perl, it should work like a charm.

A quick note about the runtime: this will run over time number_of_words ^ 2 * length_of_words , so in a list of 100 words each of 10 characters will execute in 100,000 cycles, which is suitable for most applications.

+2
source

Here is my attempt to answer. It will also allow you to see each individual match if you need it. (i.e. BALER matches 4 characters in BAKER). EDIT : now it catches all matches if there is a connection between the words (I added β€œCAKER” to the list to check).

 #! usr/bin/perl use strict; use warnings; my @wordlist = qw( BAKER SALER BALER CARER RUFFR CAKER); my %wordcomparison; #foreach word, break it into letters, then compare it against all other words #break all other words into letters and loop through the letters (both words have same amount), adding to the count of matched characters each time there a match foreach my $word (@wordlist) { my @letters = split(//, $word); foreach my $otherword (@wordlist) { my $count; next if $otherword eq $word; my @otherwordletters = split (//, $otherword); foreach my $i (0..$#letters) { $count++ if ( $letters[$i] eq $otherwordletters[$i] ); } $wordcomparison{"$word"}{"$otherword"} = $count; } } # sort (unnecessary) and loop through the keys of the hash (words in your list) # foreach key, loop through the other words it compares with #Add a new key: total, and sum up all the matched characters. foreach my $word (sort keys %wordcomparison) { foreach ( sort keys %{ $wordcomparison{$word} }) { $wordcomparison{$word}{total} += $wordcomparison{$word}{$_}; } } #Want $word with highest total my @max_match = (sort { $wordcomparison{$b}{total} <=> $wordcomparison{$a}{total} } keys %wordcomparison ); #This is to get all if there is a tie: my $maximum = $max_match[0]; foreach (@max_match) { print "$_\n" if ($wordcomparison{$_}{total} >= $wordcomparison{$maximum}{total} ) } 

The result is simple: CAKER BALER and BAKER.

The %wordcomparison hash is as follows:

 'SALER' { 'RUFFR' => 1, 'BALER' => 4, 'BAKER' => 3, 'total' => 11, 'CARER' => 3 }; 
+1
source

Here's a version that relies on word wrap to count the same characters. I used the words from your original comparison, not the code.

This should work with any long words and with any list of lengths. Exit:

 Word score ---- ----- BALER 12 SALER 11 BAKER 11 CARER 10 RUFFR 4 

The code:

 use warnings; use strict; my @w = qw(BAKER SALER BALER CARER RUFFR); my @tword = t_word(@w); my @score; push @score, str_count($_) for @tword; @score = t_score(@score); my %total; for (0 .. $#w) { $total{$w[$_]} = $score[$_]; } print "Word\tscore\n"; print "----\t-----\n"; print "$_\t$total{$_}\n" for (sort { $total{$b} <=> $total{$a} } keys %total); # transpose the words sub t_word { my @w = @_; my @tword; for my $word (@w) { my $i = 0; while ($word =~ s/(.)//) { $tword[$i++] .= $1; } } return @tword; } # turn each character into a count sub str_count { my $str = uc(shift); while ( $str =~ /([AZ])/ ) { my $chr = $1; my $num = () = $str =~ /$chr/g; $num--; $str =~ s/$chr/$num /g; } return $str; } # sum up the character counts # while reversing the transpose sub t_score { my @count = @_; my @score; for my $num (@count) { my $i = 0; while( $num =~ s/(\d+) //) { $score[$i++] += $1; } } return @score; } 
+1
source

You can do this using the dirty regex trick to execute code if the letter matches its place, but not otherwise, fortunately, it’s pretty easy to create regular expressions along the way:

Regular expression example:

 (?:(C(?{ $c++ }))|.)(?:(A(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.)(?:(E(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.) 

It may or may not be fast.

 use 5.12.0; use warnings; use re 'eval'; my @words = qw(BAKER SALER BALER CARER RUFFR); my ($best, $count) = ('', 0); foreach my $word (@words) { our $c = 0; foreach my $candidate (@words) { next if $word eq $candidate; my $regex_str = join('', map {"(?:($_(?{ \$c++ }))|.)"} split '', $word); my $regex = qr/^$regex_str$/; $candidate =~ $regex or die "did not match!"; } say "$word $c"; if ($c > $count) { $best = $word; $count = $c; } } say "Matching: first best: $best"; 

Using xor trick will be quick, but suggests a lot about the range of characters you may encounter. There are many ways in which utf-8 will break with this case.

0
source

Many thanks to all the contributors! Of course, you showed me that I still have a lot to learn, but you also helped me a lot in developing my own answer. I just put it here for reference and possible feedback, as there are probably better ways to do this. For me, it was the simplest and most direct approach that I could find on my own. Enjoy! :)

 #!/usr/bin/perl use strict; use warnings; # a list of words for testing my @list = qw( BAKER SALER BALER CARER RUFFR ); # populate two dimensional array with the list, # so we can compare each letter with the other letters on the same row more easily my $list_length = @list; my @words; for (my $i = 0; $i < $list_length; $i++) { my @letters = split(//, $list[$i]); my $letters_length = @letters; for (my $j = 0; $j < $letters_length; $j++) { $words[$i][$j] = $letters[$j]; } } # this gives a two-dimensionla array: # # @words = ( ["B", "A", "K", "E", "R"], # ["S", "A", "L", "E", "R"], # ["B", "A", "L", "E", "R"], # ["C", "A", "R", "E", "R"], # ["R", "U", "F", "F", "R"], # ); # now, on to find the word with most letters in common with the other on the same row # add up the score for each letter in each word my $word_length = @words; my @letter_score; for my $i (0 .. $#words) { for my $j (0 .. $#{$words[$i]}) { for (my $k = 0; $k < $word_length; $k++) { if ($words[$i][$j] eq $words[$k][$j]) { $letter_score[$i][$j] += 1; } } # we only want to add in matches outside the one we're testing, therefore $letter_score[$i][$j] -= 1; } } # sum each score up my @scores; for my $i (0 .. $#letter_score ) { for my $j (0 .. $#{$letter_score[$i]}) { $scores[$i] += $letter_score[$i][$j]; } } # find the highest score my $max = $scores[0]; foreach my $i (@scores[1 .. $#scores]) { if ($i > $max) { $max = $i; } } # and print it all out :D for my $i (0 .. $#letter_score ) { print "$list[$i]: $scores[$i]"; if ($scores[$i] == $max) { print " <- best"; } print "\n"; } 

When launched, the script returns the following:

 BAKER: 11 SALER: 11 BALER: 12 <- best CARER: 10 RUFFR: 4 
0
source

Source: https://habr.com/ru/post/892425/


All Articles