Memory Efficient data structure for implementing Trie

I am implementing Trie in python. So far I have come across two different ways:

  • use a Node class (similar to struct Node in C ++) with data members -

char - character saving

is_end - save the end of the word (true or false)

prefix_count - save the number of words with the current prefix

child - Node type dict (for storing other nodes, i.e. for 26 alphabets)

class Node(object): def __init__(self): self.char = '' self.word = '' self.is_end = False self.prefix_count = 0 self.child = {} 
  1. using a dictionary to store all the data.

words = {'foo', 'bar', 'baz', 'barz'}

  {'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}}, 'z': {'_end_': '_end_'}}}, 'f': {'o': {'o': {'_end_': '_end_'}}}} 

What is an efficient and standard data structure that is both efficient and effective for both traffic exchange and other trie operations on a large dataset?

+6
source share
3 answers

Why not? Just yesterday, I implemented a similar data structure for storing and retrieving a hierarchy of objects and considered this exact situation. The use of the Node object with a dictionary for children has ended. The main Node, which is the object, allows you to create your own methods for printing or receiving material, and if necessary you can even have lazy initialization (did you mention the large data set correctly?)

 class Node(object): def __init__(self): self.children = collections.defaultdict(lambda:self.__class__()) self.stuff = ... def __str__(self,depth=0): if self.children: return str(self.stuff)+'\n'+'\n'.join([' '*depth+k+' ' + v.__str__(depth+1) for k,v in self.children.items()]) else: return str(self.stuff) def get_path(self,path): node = self while path: node = node.children[path.pop(0)] ... 
+1
source

Direct replacement will be nested;

However, [possibly] more Pythonic, more compact in memory, and therefore faster for searching, would be a nested tuple .

Of course, updating such a trie becomes logN, since you will need to recreate each node ancestor. However, if your searches are much more frequent than updates, it might be worth it.

+1
source

Trie is failure when it comes to the complexity of space. Trie tends to use a lot of memory for processing and work. But to avoid this problem, there is a data structure known as a compressed data structure. Try to implement this here.

See here for more details.

-2
source

Source: https://habr.com/ru/post/985467/


All Articles