Java: the most efficient way to check if a string is in a list of words

I have an array of strings String[] words and 28,000 words Word-list.

I want to check if there is any member of the String array in WordList (the list of words is in the text file wordlist.txt)

What is the most efficient way to do this?

+4
source share
8 answers

Put the strings directly in the HashSet<String> and not in the array, and iterate over the file using the contains in the set to check the contents. You will not improve access to O (1). This will also simulate the memory used to store the Strings if any duplicates exist.

+9
source

You can try the array (tree) suffix algorithm, but you need to implement it, look:

The longest palindrome in a string using the suffix tree

+2
source

Step1: Do not use a string array. Instead of using a HashSet.

Step 2. Upload the contents of the file (ie wordlist.txt) to another HashSet

Step 3:

 Set<String> set1 = new HashSet<String>(); //Load the string array into set Set<String> set2 = new HashSet<String>(); //load the file contents into set for (String str : set1) { for (String str2 : set2) { if (str.equalsIgnoreCase(str2)) { break; } } } 
+1
source

You can use HashSet<String> or ArrayList<String> , which has a contains method. It will check if your string is saved or not.
The difference between a HashSet and an ArrayList - hashset will not allow duplication of value, and it will not maintain order, while arraylist will allow you to duplicate and ordered collection. But a HashSet is more efficient than an arraylist for performing search operations.

0
source

Create a HashSet strings as

 HashSet<String> wordSet = new HashSet<String>(Arrays.asList(words)); 

And check the word in the HashSet for the HashSet.contains (Object o) method , where word is the word you want to check if exists.

0
source

Store the serialized HashSet instead of the original .txt word. As a separate step from launching the application.

Then the application only needs to download the hash set.

0
source

HashSet add() returns false if the word is already present in the set.

 for (String str : words) { if (!wordSet.add(str)) { System.out.println("The word " + str + " is already contained."); } } 

This is a bit more complex and less low level than contains() .

0
source

A HashSet will be enough if your list of words can fit in memory.

If memory size is a problem, use BloomFilter . Although the flowering filter may not give the correct answer, you can adjust the probability with which this occurs.

0
source

Source: https://habr.com/ru/post/1500924/


All Articles