Solir uses cosine resemblance?

Question

Solir uses cosine resemblance?

I wrote a small search engine as my weekly project. It is based on the similarity of cosines between the query vector and the vector vector. A vector is computed using tf-idf token ulcers.
I learned about Apache Solr, which is a full-text search engine. My question is, does solr use cosine similarity within itself when ranking search results?

+4

engine-search lucene solr

Haider ali Jul 9 '14 at 18:49

source share

2 answers

No. Solr uses something similar to cosine similarity, but not exactly the same - there are some key differences.

If you go to the same link ( https://lucene.apache.org/core/4_10_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html ) and scroll down, you will see "Lutsenko Formula Conceptual scoring "and" Lutsen "Practical scoring formula", which give more detailed information.

/ , :

1.

tf-idf "doc-len-norm". (DefaultSimilairty) 1/sqrt ( ), 1/sqrt (sum (tf)), .. tf - count doc - , , idf . , , . , .

2. ""

, , : , / .

(), , . , , tf-idf - , (0, , 1, ) , .

+5

Brian 18 . '14 4:23

John petrone · Accepted Answer · 2014-07-09T19:47:50+0000

Yes, Solr (which runs on top of Lucene) uses cosine affinity. From the Lucene documentation:

The VSM score of document d for query q is the similarity of the cosines of the weighted query vectors V (q) and V (d)
cosine similarity (q, d) = V (q) · V (d) / | V (q) | | In (g) |

https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Solir uses cosine resemblance?

More articles: