Algorithm for finding related words in text

I would like to have a word (for example, "Apple") and process the text (or maybe more). I would like to quote relevant terms. For example: process a document for Apple and find this iPod, iPhone, Mac - these are the terms associated with "Apple".

Any idea on how to solve this problem?

+6
source share
5 answers

As a starting point: your question is related to text processing .

There are two ways: a statistical approach and one form of natural language processing (nlp).

I don't know much about nlp, but I can say something about the statistical approach:

I can recommend the following books:

+9
source

Like all AIs, this is a very complex problem. You should study natural language processing to learn about some of the issues.

One very, very simplified approach may be to build a 2d-table of words, and for each pair of words the average distance (in words) that they are displayed in the text. Obviously, you need to limit the maximum distance traveled and possibly the number of words. Then, after processing a large amount of text, you will have an indicator of how often some words appear in the same context.

+2
source

What I would do is get all the words in the text and make a list of frequencies (how often each word appears). Perhaps also add a heuristic factor to it about how far the word is from Apple. Then read a few documents and cross out words that are not common in all documents. Then assign priority based on frequency and distance from the keyword. Of course, you will get a lot of garbage and, perhaps, miss some relevant words, but when regulating heuristics, you should get at least a few worthy matches.

+2
source

The technique you are looking for is called covert semantic analysis (LSA). It is sometimes called latent semantic indexing. This technique is based on the idea that related concepts meet together in a text. It uses statistics to build word relationships. Given the rather large volume of documents, it will definitely solve your problem of finding related words.

+1
source

Take a look at vector space models .

0
source

Source: https://habr.com/ru/post/898005/


All Articles