I have a text file with a size of 50 GB of random strings from which I want to count the number of occurrences of a substring in this file .. many times for different non-predefined random substrings.
I was wondering if there is another approach to solving the problem.
probabilistic way
Something like a flowering filter, but instead of a probabilistic membership check, we may have a probabilistic count . This data structure will be used for count estimates .
Another statistical method (?)
Any dummy method that I could use to estimate the number of occurrences of a string in a text file? Discover alternatives.
It would be nice if this could be done in <= logarithmic time, since I will do the same task many times.
source share