Fuzzy .substring text-matching function

I am looking for a way for a fuzzy function substring. What I mean:

  • Two lines are given.
  • One is often longer than the other. Call then "short" and "long"
  • We want to score how many short ones appeared in the long one.
  • We want to consider proximity and others. Just as the elements of the “short” appear in the “long”, they prefer to appear in the same order and close to each other.

Example 1:

  • Short: "weeds destroyed"
  • Long: "Cultures designed with a bacterial gene, thanks to which herbicide-resistant plants can grow while weeds are destroyed, and genetically modified crops that can resist destructive insects reduce the need for chemical insecticides."

This is an exact match and should have a score of 1.0.

Example 2:

  • Short: "weeds will be destroyed"
  • Long: Same as above.

This is a fuzzy coincidence, as “weeds” and “destroyed” appear in the text, but without “will”. Nevertheless, he should get a high score (say 0.8).

Example 3:

If we set “Short” to “destroy is be weeds”, although “destroyed” and “weeds” appear in the source text, the rating should be very low, as their order has changed.

?

, . AN. .

+4
2

:

  • short (0), indexOf
  • short (n), : a) indexOf long b) () shortO (short) indexOf, , short (n-1) indexOf).
+2

(- this). , . , similarity_of_dependency_kind. , similarity_of_destination_words ( , - wordnet).

, .

, , .

+2

Source: https://habr.com/ru/post/1670243/


All Articles