Web Application Time Scalability

My goal is to generate a system similar to the system on the first reddit page.

I have things, and for simplicity, these things have voices. The best system I generated uses time decay. With half the time in 7 days, if today the vote costs 20 points, then after seven days it costs 10 points, and after 14 days it will cost 5 points.

The problem is that although this gives results, I am very pleased that it does not scale. Each vote requires me to effectively recalculate the value of every other vote.

So, I thought I could undo this idea. Voting today is worth 1 point. Voting after seven days costs 2 points, and after 14 days it costs 4 points and so on. This works well because for each vote I only need to update one line. The problem is that by the end of the year I need a data type that can contain fantastically huge numbers.

So, I tried to use linear growth, which caused terrible ratings. I tried polynomial growth (squaring and cubing the number of days since the launch and submission of the site), and this gave somewhat better results. However, as I get somewhat better results, I quickly return to unattainable numbers.

So, I come to you stackoverflow. Who got a brilliant idea or a link to an idea on how to model this system so that it scales well for a web application.

+4
source share
4 answers

I also tried to do this. I found what seems like a solution, but unfortunately I forgot how to do the math, so it's hard for me to figure it out.

The idea is to keep a journal of your account and sort by it, so the numbers will not overflow.

This document describes math. https://docs.google.com/View?id=dg7jwgdn_8cd9bprdr

And a comment where I found it here: http://blog.notdot.net/2009/12/Most-popular-metrics-in-App-Engine#comment-25910828

+2
source

Well, I thought of one decision to do this at every vote. The catch is that storing votes requires a linked list with atomic pop / click on both sides (for example, a Redis list, but you probably don't want it in RAM).

The damping interval is also required to be constant (e.g. 1 hour)

This happens as follows:

  • For each vote, update the score, click the next time the breakdown of this vote falls into the tail of the list.
  • Then enter the first vote from the list heading
  • If he is not old enough to decompose, press him back to the head.
  • Otherwise, subtract the required amount from the total score and click the updated information on the tail.
  • Repeat from step 2 until you press a fairly fresh voice (step 3).

You still have to check your heads in the background to clear messages that no one else votes, of course.

0
source

It's late here, so I hope someone can check my math. I think this is equivalent to exponential decay.

MySQL has a BIGINT max 2 ^ 64

For simplicity, use 1 day as our time interval. Let n be the number of days since the site was launched.

  • Create an integer variable. Lets call it X and start at 0
  • If the add operation would give a score of more than 2 ^ 64, first update each point by dividing it by 2 ^ n, then set X to n.
  • For each vote, add 2 ^ (nX) to the score.

So, mentally, that makes sense to me using base 10. When we add things, our number gets longer and longer. We stop caring about the numbers in the lower digits of the digits, because the values ​​that we increase have many digits. This means that lower numbers stop counting very strongly. Therefore, if they are not taken into account, why not just move the decimal place to the point at which we care, and at some point truncate the numbers below the decimal point. To do this, we need to shift the decimal point by the amount that we add each time.

I cannot help but feel that something is wrong with this.

0
source

Here are two possible pseudo-queries that you could use. I know that they cannot handle scalability, but I think they provide methods so you can

SELECT article.title AS title, SUM(vp.point) AS points FROM article LEFT JOIN (SELECT 1 / DATEDIFF(NOW(), vote.created_at) as point, article_id FROM vote GROUP BY article_id) AS vp ON vp.article_id = article.id 

or (not in a compound that will be a little faster, I think, but harder to hydrate)

 SELECT SUM(1 / DATEDIFF(NOW(), created_at)) AS points, article_id FROM vote WHERE article_id IN (...) GROUP BY article_id 

The advantage of these queries is that they can be run at any time with the same data, and they will always return the same answers. They do not destroy any data.

If you need, you can also run queries in the background job, and they will still produce the same result.

0
source

Source: https://habr.com/ru/post/1389607/


All Articles