Check if the keyword is in the line

Question

Check if the keyword is in the line

I have a list of keywords and I want to find out if a string contains any of these keywords. Now I have an O (n) solution. Is there a faster way to do this search without focusing on each keyword and making comparisons / contains?

i.e. Keywords = "cat", "hat", "mat", "bat", "fat", "perched", "rat", "pat", "foo bar", "foo-bar", String = " There's a cat in the box. " The result of this is that "cat" matches one of the words in the "keywords"

EDIT: I guess I wasn’t so clear when I said O (n). I mean O (n), where n = the number of keywords.

+6

java

user459811 Dec 02 '13 at 3:37

source share

5 answers

k = # of chars in sentence n = # of keywords m = # of words in sentence

You can get O(k + n) time complexity by hashing words in sentence .

Separating the sentence from the words, take O(k) . Creating a HashSet also takes O(k) . Checking the hash n times takes n*O(1) = O(n) , so the total time complexity is O(k + n) .

Edit1: Hashing all n keywords is technically n*O(k/m) , where k/m is avg. word length. However, k/m does not scale with the size of the input, so it still gives O(n) .

Edit2: FYI, Boyer-Moore will match any substring, not just keywords; For instance. A cat will match a catepillar. Also, since it is more general, it has a worse run time than a simple word match, O(KN) as @SteveP. has in his answer.

So, if you only need word matching, as opposed to substring, stick to hashing as above.

+1

bcorso Dec 02 '13 at 3:45

source share

You can try using contains ().

Get a string; String passed = "there is a cat in the field";

use for loop to view your keywords. if the keywords are an array.

 for(int i = 0; i < keywords.length; i++){ if(passed.toLowerCase().contains(keywords[i]){ //set true; }else{ //set false; } }

Going through a loop or checking each word separately, I don’t think you will get much better than O (n)

+1

MGhandy Dec 02 '13 at 4:01

source share

not sure if he will find inO (n).

but the decision to find the element could be like this:

  List<String> keywords = new ArrayList<String> (Arrays.asList("cat", "hat", "mat", "bat", "fat", "sat", "rat", "pat", "foo bar", "foo-bar")); String search= "There is a cat in the box." ; List<String> searchWords = new ArrayList<String> (Arrays.asList(search.split(" "))); System.out.println(!Collections.disjoint(keywords,searchWords));

0

Prabhakaran Dec 02 '13 at 3:46

source share

You probably will not be better than O (n), since there is a linear component there - you need to trawl a string in some form, form or fashion.

Consider using Set :

Constant time for adding all elements (N for N elements can be amortized)
Constant search time

 public boolean inPhrase(String phrase, String searchWord) { Set<String> phraseSet = new HashSet<>(); // remove the punctuation and split the words on white space. for(String s: phrase.replaceAll("[.,?!;"'], "").split(" ")) { phraseSet.add(s); } return phraseSet.contains(searchWord); }

0

Makoto Dec 02 '13 at 3:49

source share

Steve P. · Accepted Answer · 2013-12-02T03:47:52+0000

You can use Boyer-Moore , which includes pre-processing the string, but you cannot defeat the worst case O (KN), where K is the sum of the lengths of the keywords and N is the length of the string. The best case, of course, is sublinear, but you cannot have sublinear performance in the worst case.

Please note that comparisons are not free. This is not like you can compare two strings in O (1), to make sure they are equal, you need to iterate over the characters. Hashing gives you what you need to compare in constant time, but doesn't help anymore, since two different lines can have the same hash. This does not mean that hashing is not good, but it does not change the complexity of execution in the worst case.

In the end, you need to compare the characters, and Boyer-Moore provides a very good way to do this. Of course, if you use some kind of hash-based assembly, you can exclude certain keywords in an arithmetic constant, but this does not change the fact that in the worst case (and in many other cases) you need to compare characters.

Also note that depending on what we take in relation to the data, and how we build our indexing structure (indices), you can achieve very good actual runtimes. Just because the complexity of the worst case is not sublinear does not mean that the actual execution time will not be very fast. There is no single-element simple or correct solution; the problem can be approached in many ways. There is never a quick and dirty answer to solve all your problems when it comes to getting information.

Check if the keyword is in the line

More articles: