Algorithm and data structure for checking letters in a word with a different set of letters

I have a dictionary of 200,000 words and a set of letters. I need an algorithm to check if all letters of a word are in this set of letters. Check words very slowly one by one. Since processing requires a huge amount of words, for this I need a data structure. Any ideas? Thanks!

For example: I have a set of letters {b, g, e, f, t, u, i, t, g, n, c, m, m, w, c, s}, I want to check the word "big" and buff. All the letters "big" are a subset of the original set, then the "big" is what I want, and "buff" is not what I want, because there is only one "f" in the original set. This is what I want to do.

+4
source share
3 answers

This is something like Scrabble or Boggle, right? Well, what do you do, pre-create a dictionary by sorting the letters in each word. So word becomes dorw . Then you push all this into the Trie data structure. So, in your Trie, the dorw sequence will point to the word value.

[Note that since we sorted the words, they lose their uniqueness, so a single sorted word can indicate several different words. those. your Trie must store a list or array on its data nodes]

You can save this structure if you need to load it quickly without any sorting steps.

What you do then is take your introductory letters, and you sort them. Then you start going through your Trie recursively. If the current letter matches an existing path in Trie, you follow it. Since you may have an unused letter, you can also delete the current letter.

And it's that simple. Every time you come across a node in your Trie that matters, it is a word that you can make from the letters you used to do this. You simply add these words to the list when you find them, and when the recursion is complete, you have found all possible words.

If you have duplicate letters at your input, you may need additional logic to prevent multiple instances of the same word (if you do not want it). This logic can be invoked during the step that β€œleaves” the letter (you simply skip all duplicate letters) to the next letter.


[edit] It seems you want to do the opposite. My solution above finds all possible words that can be made from a set of letters. But you want to check a specific word to see if it is allowed, given the set of letters that you have.

It's simple.

Store the available letters as a histogram. That is, for each letter you save the number that you have. Then you go through each letter in the test word, creating a new histogram when you go. As soon as one of your histograms exceeds the value in your available letters, the word cannot be made. If you get to the end, you can successfully make a word.

+7
source

You can use an array to indicate a set of letters. Each element of the array denotes a letter. To convert a letter to an element position, you just need to subtract the ASCII code "a" or "A". Then the first element means "a", then the second - "b", etc. Then the 27th place is β€œA”. The value of an element means occurrences. For example, the array {2, 0, 1, 0, ...} is denoted as {a, c, a}. Pseudocode can be:

  for each word
         copy the array to a new one
         for each letter in the word
             get the element position of the letter: position = letter - 'a'
             decrease the element value in the new array by one: new_array [position] -
             if the value is negative, return not found: if array [position] <0 {return not found;}
0
source

sort a set, then sort each word and perform a merge operation

0
source

Source: https://habr.com/ru/post/1487396/


All Articles