The similarity of the two texts (adaptive local keyword alignment?)

Question

The similarity of the two texts (adaptive local keyword alignment?)

I have 2 texts (maximum 4000 characters) of various lengths. And I need to get a similarity coefficient based on (partial) rephrasing. Please note that the same part of the texts may be in different positions in each text (So Levenshtein is not a solution).

The comparison process should also be:

do not increase expo. with text size
be performance friendly. :)

It seems that " adaptive local keyword alignment " is a possible solution.

Do you have an example implementation? PHP is the preferred language, but I can translate. :)

Do you have any other solution / idea / experience on this topic?

Thanks for your great help.

+3

algorithm similarity

Toto Aug 19 '09 at 12:08

source share

2 answers

karim79 · Answer 1 · 2009-08-19T12:11:01+0000

Take a look at the features levenshteinand similar_textthat make your life easier:

EDIT: @Toto pointed out that they may not be suitable for this application, see his comments below.

jilles de wit · Answer 2 · 2009-08-19T12:31:15+0000

Needleman-Wunsch worked well enough for an application where I had to match names assigned to the same people by different people.

The similarity of the two texts (adaptive local keyword alignment?)

More articles: