NLP algorithm for "filling" search queries

Question

NLP algorithm for "filling" search queries

I am trying to write an algorithm (which, I believe, will rely on natural language processing methods) to "populate" the list of search terms. There is probably a name for this kind of thing that I don't know about. What is the name of this problem, and what algorithm will give me the following behavior?

Input:

docs = [ "I bought a ticket to the Dolphin Watching cruise", "I enjoyed the Dolphin Watching tour", "The Miami Dolphins lost again!", "It was good going to that Miami Dolphins game" ], search_term = "Dolphin"

Conclusion:

 ["Dolphin Watching", "Miami Dolphins"]

It should be understood in principle that if Dolphin appears at all, it is almost always either in the Dolphin Observation bigrams or the Miami Dolphins. Solutions in Python are preferable.

+6

python nlp n-gram

Trindaz Sep 29 '11 at 23:30

source share

2 answers

I used the Natural Language Toolkit in my NLP class at university with decent success. I think it has some tags that can help you identify which nouns and help you parse it into a tree. I don’t remember much, but I started there.

0

mpen Sep 29 '11 at 23:49

source share

Fred foo · Accepted Answer · 2011-09-30T09:28:39+0000

It should be understood in principle that if a Dolphin appears at all, it is almost always either in the “View Dolphins” or “Miami Dolphins” bits.

It looks like you want to identify the collocations in which the dolphin is located. There are various methods for finding collocation, the most popular of which is calculating the exact mutual information (PMI) between the conditions in your case, then select the terms with the highest PMI for Dolphin. You may recall the PMI from the sentiment analysis algorithm that I suggested earlier.

The Python implementation of various collocation search methods is included in NLTK as nltk.collocations . This area is covered by some depth in Manning and Schütze FSNLP (1999, but is still relevant for this topic).

NLP algorithm for "filling" search queries

More articles: