Reducing memory usage when creating a trie / suffix tree in C ++

I am trying to do trie in C ++, now my main data structure looks like this:

struct node{ int count; no of times this node has been visited. struct node* child[ALPHABET_SIZE]; // Let ALPHABET_SIZE be 26 } 

When the line size becomes large, most of the allocated memory is lost. For example, if we insert "he" Our tree will be

 root---->h--->e |--->e 

We see that only 2/26th allocated memory is used in the root. How to improve? .

+4
source share
2 answers

Some very simple tips:

  • If your branching ratio is expected to be low, consider using something other than an array for children. For example, you can have an array of letters in node * pairs and either perform a linear or binary search on them (if they are ordered). You can also use any card.
  • You can also play with smaller integer sizes for counting, if you do not expect counting in millions / billions.
  • Instead of dynamically allocating nodes, you can get them from an arena-based allocator (i.e., an object pool), avoiding the overhead of allocating the heap, which is often added to objects allocated on the heap.
+2
source

Instead of creating a fixed-size array for each node, create an array with 1 element and resize it (replace it with a new array of size + 1) when you insert the child. The insert will be slower, so you can test and change the resizing algorithm (size + 1 or size * 2 or size + size / 2) so that there are fewer distributions if it gets too slow.

0
source

Source: https://habr.com/ru/post/1496041/


All Articles