What is the best way to select a piece of text to shorten by keywords?

When you do a search on Stackoverflow, it shortens the part of the question description that best suits your criteria, and after that it marks the words of the criteria.

I wonder how best to do this manually in C #, which means without the help of a full-text search engine.

The main problem: how to quickly select the best text part?

What i have done so far:

  • I get text space indices. This allows me to know where words start so that I can start substrings from them.
  • From each space index, I get 300 characters in front and check how many occurrences of keywords I find.
  • I guess the length is 300 characters long, which is the best of all, so I cut it off from the source text.

Is this a good approach? Is there a faster way? Is counting the number of occurrences the best way to find the most relevant part?

+3
source share
1 answer

Using this approach, you will often find a better match for keywords at the beginning or end of a match, which means that you will not have much context for these keywords. I would add an additional condition that there should be n words on either side of the keywords next to the beginning and end of the match.

, , .

term frequency - , .

+1

Source: https://habr.com/ru/post/1731947/


All Articles