What is an efficient search algorithm for automatic completion?

I have a list of 10,000 keywords. What is an efficient search algorithm that autocompletes this list?

+4
source share
5 answers

Using Trie is an option, but they are inefficient in size. They can be made more convenient to use using a modified version known as the Radix Tree or Patricia Tree.

A triple search tree is likely to be the best option. Here's an article on this topic: β€œ Effective autocomplete with a triple search tree. ” Another great article about using a triple search tree for spelling correction (a similar problem for autocompletion): β€œ Using ternary DAGs for spelling correction. ”

+6
source

I think binary search works just fine for 10,000 entries.

+4
source

A trie: http://en.wikipedia.org/wiki/Trie gives you O (N) search time whenever you type a letter (I assume you want a new sentence when a letter is typed). This should be effective enough if you have small words and the search space will be reduced with each new letter.

+3
source

As already mentioned, you save your words in the database (see "Automatic tips for technologies and parameters" ), create an index of these words and let the database work. They know how to do it effectively.

0
source

A fairly roundabout method for SO for crosswords was proposed.

It can be adapted here quite easily :)

The idea is simple, but quite effective: it consists in indexing words, building one index per letter position. It should be noted that after 4/5 letters, the subset of available words is so small that brute force is probably best ... this, of course, needs to be measured.

As for the idea, there is a Python way here:

class AutoCompleter: def __init__(self, words): self.words = words self.map = defaultdict(set) self._map() def _map(self): for w in words: for i in range(0,len(w)): self.map[(i,w[i])].insert(w) def words(self, firstLetters): # Gives all the sets for each letter sets = [self.map[(i, firstLetters[i])] for i in range(0, len(firstLetters))] # Order them so that the smallest set is first sets.sort(lambda x,y: cmp(len(x),len(y))) # intersect all sets, from left to right (smallest to biggest) return reduce(lambda x,y: intersection(x,y), sets) 

The memory requirement is quite strict: one entry for each letter in each position. However, the entry means that the word exists with a letter in this position, which does not apply to everyone.

The speed seems to be good if you want to automatically fill in a 3-letter word (the classic threshold for starting autocompletion):

  • 3 hash search
  • 2 intersections of sets (definitely one spot), but ordered as efficient as possible.

I would definitely need to try this against the triple tree and approach to see how it works.

0
source

Source: https://habr.com/ru/post/1302331/


All Articles