Partially filled word matching algorithm

I am writing a game that, given a partially filled word, searches for a dictionary and returns all the relevant words. For this, I am trying to find an algorithm that can be used for the indicated purpose. For example, given - - a -, the algorithm will search for a dictionary for all words with a length of 4 and have "a" as the third letter.

Is there such an algorithm? If not, can someone give an approximate idea of ​​how to develop such an algorithm?

Thanks at Advance.

+4
source share
5 answers

Well, it does not exist yet, but it has already been researched on SO, for problems with crosswords.

The essence of the solution I propose was indexing by letter and indexes, which Python gives:

class Index: def __init__(self, list): self.data = defaultdict(set) for word in list: self.add(word) def add(self, word): for l in range(0, len(word)): self.data[(l, word[l])].insert(word) def look(self, letters): """letters is a list of tuples (position, letter)""" result = None for (p,l) in letters: set = self.data[(p,l)] if result == None: result = set else: result = result.insersection(set) return result 

The idea is simple: you have a large index that has a set of words for each pair (position,letter) . In your case, this can be expanded to have one index per word length, which will reduce the size of word sets and therefore be faster.

To search, you simply cross all the sets to have a common set of words corresponding to all known letters.

+3
source

Another solution may be to structure the dictionary as a prefix tree . Then your algorithm will have to go through this tree. For each node, you know which letter is connected and the position in the word, so you know if it matches the letter you are looking for. If you do not stop and do not miss your children. You also know when you go along the length of your request. Each sheet you access can be added to the list of results.

This solution can be quite effective in terms of memory consumption.

+1
source
 test = '--a-'; for each (words as word) { if ((word.length == test.length) && (test.index(0) == '-' || (word.index(0) == test.index(0))) && (test.index(1) == '-' || (word.index(1) == test.index(1))) && (test.index(2) == '-' || (word.index(2) == test.index(2))) && (test.index(3) == '-' || (word.index(3) == test.index(3)))) { // match } } 

This is what you need? Obviously, it is slightly modified to work at different lengths.

0
source

From what I understood, you cannot use a regex query? in the example above, the pattern is similar to ??a?

Then you need to go through all the words and check if there is a match.

0
source

If you are working on a sufficiently powerful computer (compared to downloading), then PierrOz has a good answer: save the dictionary as a tree of prefixes. Then you can perform a breadth-first search, cropping only when you reach the level at which you really know the letter.

If you need an even faster solution, you need to limit the depth of your search. One possibility is to collect answers. For example, you can start by grouping words by length; then you only need to view lists of words of a certain length. Then you can subgroup with words containing specific letters - all pairs of letters are probably enough. This gives you an array of 13,000 elements that you can quickly index: count the number of letters in your word, then select the rarest letter or two in the word and use this to index into a mini-prefix tree that only contains words of this length with these letters. In most cases, this strategy should bring you up to several hundred words per bin, and finding the prefix tree should be quick, even if you select most of the width of the tree.

0
source

Source: https://habr.com/ru/post/1305826/


All Articles