Solir uses cosine resemblance?

I wrote a small search engine as my weekly project. It is based on the similarity of cosines between the query vector and the vector vector. A vector is computed using tf-idf token ulcers.
I learned about Apache Solr, which is a full-text search engine. My question is, does solr use cosine similarity within itself when ranking search results?

+4
source share
2 answers

Yes, Solr (which runs on top of Lucene) uses cosine affinity. From the Lucene documentation:

The VSM score of document d for query q is the similarity of the cosines of the weighted query vectors V (q) and V (d)

cosine similarity (q, d) = V (q) Β· V (d) / | V (q) | | In (g) |

https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

+5
source

No. Solr uses something similar to cosine similarity, but not exactly the same - there are some key differences.

If you go to the same link ( https://lucene.apache.org/core/4_10_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html ) and scroll down, you will see "Lutsenko Formula Conceptual scoring "and" Lutsen "Practical scoring formula", which give more detailed information.

/ , :

1.

tf-idf "doc-len-norm". (DefaultSimilairty) 1/sqrt ( ), 1/sqrt (sum (tf)), .. tf - count doc - , , idf . , , . , .

2. ""

, , :   , / .

(), , . , , tf-idf - , (0, , 1, ) , .

+5

Source: https://habr.com/ru/post/1547812/


All Articles