Java counting the appearance of a word from a huge text file

Question

Java counting the appearance of a word from a huge text file

I have a 115 MB text file. It consists of 20 million words. I have to use this file as a set of words and use it to search for the occurrence of each user word from the collection. I use this process as a small part of my project. I need a method to determine the number of occurrences of given words faster and more correctly, since I can use it in iterations. I need to offer some kind of API that I can use, or in some other way that performs the task faster. Any recommendations are appreciated.

+3

java full-text-search

Naveen Feb 09 '11 at 6:45

source share

1 answer

Stephen C · Accepted Answer · 2011-02-09T07:15:29+0000

This type is usually implemented using Lucene , especially if you intend to re-run your application again or do not run there is a lot of memory. Lucene supports many other goodies .

However, if you want to "collapse your" code and you have enough memory (possibly 1 GB), your application may:

analyze a file in a sequence of words,
filter stop words
create a "reverse index" like HashMap<String, List<Integer>>, where the values Stringare unique words and the objects List<Integer>give offsets of the words "occurring" in the file.

( ). , . ( , .)

Java counting the appearance of a word from a huge text file

More articles: