Effective keyword discovery / retrieval. Predefined Keyword Set

How can I efficiently extract keywords relevant from a string? My keyword list is predefined. For example, in an article about Michelle Obama, which is also referred to Barack Obama, I want to remove Michelle Obama, and Barack Obamawhen a keyword Michelle Obamawill get a higher value relevance (both Michelle Obamaand Barack Obamathere in my list of keywords).

Checking the line for the number of occurrences of each keyword does not seem very effective. My application is developed in PHP, but any language is fine, if I can do it efficiently.

I tried OpenCalais, but it does not detect most of my keywords. Can I retrieve keywords using Lucene?

+3
source share
1 answer

The apache lucene package is right for you. However, if you have a heading and paragraphs, you can filter out stop words, give higher ranks for words in the title, and then match them or their shapes in paragraphs. You can consult some articles to summarize the text for better programming yourself.

+1
source

Source: https://habr.com/ru/post/1789239/


All Articles