With your perfect help here, I’ve already figured out how to calculate trending topics (standard rating + floating average).
My next problem: I have terms (1-3 words) in my database related to the time they were mentioned. But trending topics always consist only of words with 1 word, since one part of the term is ALWAYS mentioned more often than the full term. Example: Yesterday 3 news articles were about “Barack Obama” and today 148. Then “Barack Obama” grows, of course. But the "Barack" is also growing, and therefore it is a trend.
How to include word length when calculating trend topics? I do not want to use another algorithm, I am completely satisfied with the algorithm above. Can I multiply the score of all two-word terms with 1.5 or so?
Detailed example: My main trends: Microsoft, China, Hillary Clinton, Dallas Mavericks. I wanted to say that “Hillary Clinton” and “Dallas Mavericks” never occupy a single or a single 2, because these are two-word terms. “Microsoft” and “China” are single-word words, so they are always rated better. Is there any way to solve this problem?
I hope you help me. Thanks in advance!
source
share