How to enable Reddit / post ranking page algorithm?

I am trying to learn how to code a website algorithm such as Reddit.com where there are thousands of messages to be ranked. Their ranking algorithm works like this (you do not need to read it, I have more of a general question): http://amix.dk/blog/post/19588

Right now I have records stored in the database, I record their dates, and each of them has an upvotes and downvotes field, so I keep their records. I want to find out how you keep your rating? When specific posts have ranking values, but they change over time, how can you maintain their ranking?

If they are not saved, do you occupy every post every time a user loads a page?

When will you store messages? Do you run a cron job to automatically give each message a new value every x minutes? Do you store their value? This is temporary. Maybe until this post reaches its minimum score and is not forgotten?

+4
source share
3 answers

I would definitely not calculate their rank every time you show them.

A simple and not very effective solution would be to cache the ranking of messages, and after changing one ranking, you clear or update the cache.

This is not ideal, but it is possible.

Another way would be to do as you were told: calculate and store the ranks in the database (and ideally cache them), and then update these ranks with a cron job every x minutes.

Again, these are the basic approaches to what you want to do. Then you can build them over time.

The algorithm you choose is likely to be very important for your needs.

You also need to determine what traffic will be received on your site, since it will determine what lengths you must go in order to get the right algorithm.

+6
source

I would immediately calculate the score for a single vote on a time-weighted scale. I would send this account to the queue or use it to increase the field, depending on which of them is made for you.

In a normal time period, I would take all the articles currently published and all the articles that received votes during the time window, and cancel all ranked articles, followed by all the articles in the queue, in descending order of count, until I calculated enough to fill out my rating quota.

The rating list will be cached and used until the next ranking cycle. You will need to adjust the queue retention period (perhaps everything that was active in the last N queues is reordered), articles are saved, etc. Based on the loading of your site, but this should be a well-executed starting point.

+2
source

If you use the exact reddit algorithm, you only need to change the ranking field whenever an item is voted up or down - and only when the difference between upvotes and downvotes changes by orders of magnitude. This article explains a bit more about how their ranking works.

http://bibwild.wordpress.com/2012/05/08/reddit-story-ranking-algorithm/

In principle, voting โ€œup and downโ€ serves only to โ€œreplaceโ€ messages. If D is the difference between the number of increases and downvotes, the message is shifted up or down by 12 hours by an order of magnitude of D. Besides, this is just a simple ranking.


if you want to use your own rating system, in which the age of the message has a value other than linear, you will either have to create an indexed field, or recalculate the ranking at time intervals, as was said, or just put your sort in your SQL query, as I already said in my comment. But, most likely, you can find a way when it does not need to be counted again and again.

+1
source

Source: https://habr.com/ru/post/1437095/


All Articles