How can I efficiently extract keywords relevant from a string? My keyword list is predefined. For example, in an article about Michelle Obama, which is also referred to Barack Obama, I want to remove Michelle Obama, and Barack Obamawhen a keyword Michelle Obamawill get a higher value relevance (both Michelle Obamaand Barack Obamathere in my list of keywords).
Checking the line for the number of occurrences of each keyword does not seem very effective. My application is developed in PHP, but any language is fine, if I can do it efficiently.
I tried OpenCalais, but it does not detect most of my keywords. Can I retrieve keywords using Lucene?
source
share