The similarity of the two texts (adaptive local keyword alignment?)

I have 2 texts (maximum 4000 characters) of various lengths. And I need to get a similarity coefficient based on (partial) rephrasing. Please note that the same part of the texts may be in different positions in each text (So Levenshtein is not a solution).

The comparison process should also be:

  • do not increase expo. with text size
  • be performance friendly. :)

It seems that " adaptive local keyword alignment " is a possible solution.

Do you have an example implementation? PHP is the preferred language, but I can translate. :)

Do you have any other solution / idea / experience on this topic?

Thanks for your great help.

+3
source share
2 answers

Take a look at the features levenshteinand similar_textthat make your life easier:

EDIT: @Toto pointed out that they may not be suitable for this application, see his comments below.

+4
source

Needleman-Wunsch worked well enough for an application where I had to match names assigned to the same people by different people.

0
source

Source: https://habr.com/ru/post/1715523/


All Articles