I think you need to step back and look at some of your questions from a different angle in order to get answers.
"how often is the background thread running?" To answer this, you need to answer these questions: how much data can you lose? What is the reason that the data is in MySQL, and how often is this data available? For example, if a database is required only once a day to receive a report, you may need to update it only once a day. On the other hand, what if a Redis instance dies? How many increments can you lose and still be "good"? They will provide answers to the question of how to update your MySQL instance frequently, and we cannot answer for you.
I would use a completely different strategy to store this in redis. For the sake of discussion, let's assume that you decide that you need to “flush dB” every hour.
Store every hit in the hashes using the key name structure in these lines:
interval_counter:DD:HH interval_counter:total
Use the page identifier (for example, the sum of the MD5 URI, the URI itself or any other identifier that you are currently using) as a hash key and make two increments on the page; one for each hash. This gives you the current amount for each page and a subset of the pages to update.
After that, you will have a cron job for a minute or so after the start of the hour to display all pages with the updated number of views, capturing the previous hash. This gives you a very fast means of retrieving data for updating your MySQL database, while avoiding the need to do maths or play tricks with timestamps, etc. Pulling data from a key that no longer increases on the bing, you avoid the race conditions due to the distortion of the clock.
You can set the expiration of the daily key, but I would prefer to use the cron job to delete it when it successfully updated the database. This means that your data still exists if the cron job fails or fails. It also provides the interface with a complete set of known counter data using keys that do not change. If you wanted to, you could even store data daily to be able to view windows on how popular the page is. For example, if you saved a daily hash for 7 days, setting the expiration date using the cron job instead of deleting, you can display how much traffic each page had per day in the last week.
Two hincr operations can be performed both solo and pipelined; it is still performed quite well and more efficiently than performing calculations and processing data in code.
Now the question is about running out of pages with low traffic and memory usage. Firstly, your data set is not like the one that will require huge amounts of memory. Of course, a lot depends on how you identify each page. If you have a numeric identifier, the memory requirements will be quite small. If you are still loading too much memory, you can configure it through configuration, and if necessary, you can even use 32-bit redis compilation to significantly reduce memory usage. For example, the data that I describe in this answer I used to manage one of the ten most downloaded forums on the Internet and consumed less than 3 GB of data. I also saved the counters in a much more "time window" than I describe here.
However, in this case, Redis is a cache. If you still use too much memory after the options above, you can set the key expiration and add an expire command for each ht. More specifically, if you follow the pattern above, you will do the following for each hit:
hincr -> total hincr -> daily expire -> total
This allows you to keep everything that is actively used fresh by extending its validity period at every access. Of course, for this you will need to wrap your screen call to catch the null answer for hget to hash the totals and populate it from the MySQL database, and then increase. You can even do both as an increment. This will preserve the above structure and will most likely be the same code base that is needed to upgrade a Redis server from MySQL Db if you need to reconfigure Redis node. To do this, you will need to consider and decide which data source will be considered authoritative.
You can tune cron job performance by changing your interval to match the data integrity parameters that you determine from earlier questions. To speed up cron nob, you shrink the window. With this window-shrinking method, you should have a smaller set of pages to refresh. The big advantage here is that you do not need to determine which keys you need to update, and then go after them. you can do hgetall and iterate over the hash keys for the update. It also saves a lot of rounds by getting all the data at once. In any case, if you probably want to consider the second instance of Redis, subordinate to the first, to complete your readings. You are still performing a deletion against the wizard, but these operations are much faster and less likely to introduce delays in your recording instance.
If you need a Redis DB hard drive, then of course put this on a subordinate instance. Otherwise, if you have a lot of data that changes frequently, your RDB dumps will constantly work.
Hope this helps. There are no "canned" answers, because in order to use Redis correctly, you first need to think about how you will access the data, and this is very different from the user to the user and the project for the project. Here I based the route taken from this description: two consumers accessing the data, one to display, and the other to determine the update of another data source.