If your inserts are small, you can create a suffix tree or a suffix array (using a lazy implementation). Since the inserts are <k you only need to build the tree to this depth, and the structure will occupy limited memory.
edit: if you need to save suffix identifiers (= integers), they will not fit into memory if the text does not apply unfortunately
The suffix tree (or the suffix array, which is more compact) then represents all the substrings of your text, and then you can do a simple search:
Is a substring in a tree?
Yes β return the suffix (which is in the leaves of the tree).
No -> add it and add text to the source file.
I want to delve into this, but first I need to know about the size of the templates.
EDIT: note that the insert then only takes O (k) time!
EDIT2: If the templates are not limited in length, you may need to build a complete tree that is O (N) in space and time, the problem is that you usually have a coefficient> 10 bytes / char. Regards irW
source share