This type is usually implemented using Lucene , especially if you intend to re-run your application again or do not run there is a lot of memory. Lucene supports many other goodies .
However, if you want to "collapse your" code and you have enough memory (possibly 1 GB), your application may:
- analyze a file in a sequence of words,
- filter stop words
- create a "reverse index" like
HashMap<String, List<Integer>>, where the values Stringare unique words and the objects List<Integer>give offsets of the words "occurring" in the file.
( ). , . ( , .)