I recently asked this question during an interview session in C #:
How would you effectively find the number of occurrences of a word in a huge text, like a big book (Bible, dictionary, etc.).
I wonder what would be the most efficient data structure for storing the contents of a book. The dirtiest soul I could think of was to store it in a StringBuilder and find the count of substrings, but I'm sure there should be a much better way to do this.
And for a string with a sufficient size, there are several ways to do this using substrings, regular expressions, etc., but for the most complex string in the most efficient way.
Update: I am looking for the following:
Assuming there is a text file, let's say the Bible is 20 MB in size again, and I want to find the number of times the word “Jesus” appears in the text, except loading only 20 MB into a string or StringBuilder and using a substring or regular expression to find a match, is there any other data structure that can be used to store all the contents of the text. Actual searches can be done in several ways, and I'm looking for the most efficient “data structure” for temporary storage.
source share