Tf-idf: Is this approach right?

I would like to calculate the frequency using tf-idf. I developed an equation in which you should get the tf-idf value on the left side. Is it correct?

Tf-idf for DOCUMENT:

tf-idf(WORD) = occurrences(WORD,DOCUMENT) / number-of-words(DOCUMENT) * log10 ( documents(ALL) / ( 1 + documents(WORD, ALL) ) )
  • occurrences(WORD,DOCUMENT): number of entries WORDinDOCUMENT
  • number-of-words(DOCUMENT): number of words in DOCUMENT
  • documents(ALL): number of documents in the database
  • documents(WORD, ALL): number of documents in the database containing WORD

It would be great if you could help me. Thank you so much in advance!

+3
source share
1 answer

According to the wikipedia article, this is correct, you can change to 1 + documents (WORD, ALL), and not just documents (WORD, ALL), as the Wikipedia article suggests.

TF-IDF on Wikipedia

+1
source

Source: https://habr.com/ru/post/1715764/


All Articles