Where do the mathematical algorithms for ranking Reddit come from?

I recently looked at the Reddit algorithm to determine what makes a message a hot topic and what content is appropriate for the reddit homepage.

The article I read is here: http://amix.dk/blog/post/19588

I noticed that they have mathematical logarithms, and they created some kind of mathematical function to determine the hotness / relevance of the message.

In the formulas used, where do each of the mathematical components come from and how do they know to use them?

Thank you!

- Bakz

EDIT: Just to clarify, I just graduated from high school and apologized if the answer to this question seems pretty obvious. Thanks again!

+6
source share
2 answers

I will consider the first formula for the "fervor" of messages. Formulas like this come from requirements. Reddit designers have thought about what they want to achieve, and have developed formulas accordingly. I can’t say exactly what requirements they had in mind, but I can look at the implementation and assume that they need a system in these lines:

  • You can not overestimate the points if the number of votes does not change. This reduces the number of changes to the database and makes it easier to achieve consistency if the data is replicated. (Thus, any scoring system based on points becoming lower than the age of the article will not be good).

  • If two stories are equally old, then whoever has more pain should be higher. (Therefore, there must be a contribution from the votes.)

  • The more the story grows, the longer it should stay near the top of the ranking.

  • Old stories should not remain at the top of the rating forever, even if they had many advances. Pretty soon (in a day or two) new stories should get ahead of them. (Thus, there should be a contribution from the date, and this should outweigh the score because of the votes pretty soon, regardless of how many votes something gets.)

  • Stories with more downvotes than upvotes should not be displayed in the ranking at all.

Now consider the formula: log z + yt / 45000 and see how it satisfies these requirements.

  • If the number of votes does not change, then z, y and t do not change. Thus, the assessment has not changed. This satisfies requirement (1).

  • If two stories have the same age, then they have the same value for t. But the one with the most points has a higher z value, and since log is monotonous, he has a higher score. This satisfies requirement (2).

  • The higher the indicator, the higher its z, the longer it will be until another story with a higher t can overtake it. This satisfies requirement (3).

  • Logarithm is a function that grows more slowly as it increases ( take a look at its graph ). Thus, history requires more and more with time to keep up with new stories. This satisfies requirement (4).

  • If the story has more downvotes than upvotes, then z = 1 and y = -1, so the rating is negative. This satisfies requirement (5).

The constant 45,000 is a large-scale factor that leads to an increase in the proportion and age. There are 86,400 seconds per day, so t increases by this amount every day. Dividing t by 45,000 gives 1.92, which means that one day, relative novelty costs 10 1.92 = 83 votes, and relative novelty in two days costs about 7000 votes.

+22
source

They do not occur anywhere. For them there is no absolute truth and proves nothing. This is just a way to quantify the attribute as the most reasonable, as it seemed to the development team.

You would use a journal if you want something to be a factor, albeit a less important one (since large values ​​do grow, albeit very slowly). Nevertheless, they could choose a cubic root.

Formulas - just a representation of the factors that we can assume - are those that characteristically refer to something "hot", and their composition is such that each of them is taken into account in an appropriate proportion (for example, we are "square values ​​that are of great importance, and take a journal of those that are smaller).

As soon as they came up with the formula, they probably came up with 10 or 15 different types of messages and included the numbers and saw that it was of great importance in everything, so they got stuck in it. In fact, the first few attempts probably did not work out so well, and after a few attempts with numbers came to this formula.

+2
source

Source: https://habr.com/ru/post/891925/


All Articles