What is the best way to normalize points for ranking?

I am wondering how to do normalization of numbers for a ranking algorithm

let's say I want to rank a link based on importance, and I have two columns to work with

so the table will look like

url | comments | views

Now I want to rate comments higher than opinions, so I would first think of making comments * 3 or something to their weight, however, if there is a large review number, for example, 40,000 and only 4 comments, then the weight of the comments drops out.

So, I think I need to normalize these scores to a more equal playing field before I can weigh them. Any ideas or pointers on how this is usually done?

thank

+3
source share
3 answers

For each URL, you can first normalize comments and views to percentile. For instance,

 comment_percentile = (comments - min(comments)) / (max(comments) - min(comments))
 views_percentile = (views - min(views)) / (max(views) - min(views))

You can then assign weights to each of the percentile values ​​to calculate the total score.

 url_score = (comment_percentile_weight * comment_percentile) + (views_percentile_weight * views_percentile)

Additional strategies may include eliminating outliers if the value cluster is tied to one end of the range.

+5
source

Importance is really a way to notify the user of how interested they are in a forum topic or blog. In this case, you cannot just multiply two numbers by different factors and add :)

What can you say about the 2000 viewing blog and just one comment. Perhaps this is a spam post, or it was viewed by web scanners, or it is so boring that no one decided to comment on it.

. " " 1/2000, , 28 1 , 1/28.

. , ... , :)

+1

SO: " /" .

, : / - .

0

Source: https://habr.com/ru/post/1750422/


All Articles