Why are the Kosin and TF-IDF similarities used together?

TF-IDF and Cosine Similarity are a commonly used combination for text clustering. Each document is represented by TF-IDF weight vectors.

This is what my textbook says.

Using the Cosine affinity, you can calculate the affinity between these documents.

But why exactly are these methods used together?
What is the advantage?

Can I use the resemblance to Jaccard?

I know how this works, but I want to know why exactly these methods.

+5
source share
1 answer

TF-IDF is used for weighing.

Cosine is the measure used.

You can use cosine without weighing, but the results are usually worse. Jacquard works on sets - it is not clear how to use the scales, without turning them into something else, without making it the same as cosine.

+3
source

Source: https://habr.com/ru/post/1242647/


All Articles