Django () Model Number with Caching

I have a Django monitoring application with an Apache Prometheus model called Sample .

I want to track the Sample.objects.count () label and cache this value for a specific time interval to avoid costly COUNT (*) queries in the database.

From this lesson https://github.com/prometheus/client_python#custom-collectors I read that I need to write a custom collector.

What is the best approach to achieve this? Is there a way in the django to get the Sample.objects.count() cached value and update it after K seconds? I also use Redis in my application. Should I store this value there? Should I make a separate thread to update the Sample.objects.count() cache Sample.objects.count() ?

+6
source share
2 answers

First of all, it should be noted that you do not need to cache the result of the count (*) request.

Although different RDBMSs handle counting operations differently, they are slow throughout the table for large tables. But one thing they have in common is that there is an alternative to SELECT COUNT (*) provided by the RDBMS, which is actually a cached result. Well, sort of.

You have not mentioned what your RDBMS is, so let's see how this happens in the popular ones used with Django

MySQL

If you have a primary key on your desk and you are using MyISAM. SELECT COUNT () works very fast in mysql and scales well. But chances are you are using Innodb. And this is the correct storage mechanism for various reasons. Innodb is a transaction and cannot process COUNT () as well as MyISAM, and the query slows down as the table grows.

requesting a counter in a table with 2M records took 0.2317 seconds. The following query took 0.0015 seconds

 SELECT table_rows FROM information_schema.tables WHERE table_name='for_count'; 

but he reported a value of 1997289 instead of 2 million, but close enough!

Thus, you do not need your own caching system.

Sqlite

Sqlite COUNT (*) queries are not very slow, but they do not scale. As the size of the table grows, the invoice query speed slows. Using a table similar to the table used in mysql, SELECT COUNT(*) FROM for_count takes 0.042 seconds to complete.

No short cut. The sqlite_master table sqlite_master not contain the number of rows. And pragma table_info

You need your own system to cache the result of SELECT COUNT (*)

Postgresql

Although open source RDBMS is fully functional, postgresql is not suitable for processing count (*), it is slow and does not scale very well. In other words, no different from a bad relationship!

The count query took 0.194 seconds in postgreql. On the other hand, the next query took 0.003 seconds.

 SELECT reltuples FROM pg_class WHERE relname = 'for_count' 

You do not need your own caching system.

SQL Server

The COUNT query on the SQL server took an average of 0.160 seconds, but it fluctuated quite wildly. For all the databases discussed here, the first count (*) request was rather slow, but subsequent requests were faster because the file was cached by the operating system.

I am not an expert on SQL-server, therefore, before answering this question, I did not know how to search for the number of rows using the information about the schema. I found this Q&A useful. One of them I tried gave a result of 0.004 seconds

 SELECT t.name, s.row_count from sys.tables t JOIN sys.dm_db_partition_stats s ON t.object_id = s.object_id AND t.type_desc = 'USER_TABLE' AND t.name ='for_count' AND s.index_id = 1 

You do not need your own caching system.

Integrate in Django

As you can see, all the examined databases, except sqlite, provide a built-in "Cached Query Count". We do not need to create one of ours. Easily create a client manager to use this feature.

 class CustomManager(models.Manager): def quick_count(self): from django.db import connection with connection.cursor() as cursor: cursor.execute("""SELECT table_rows FROM information_schema.tables WHERE table_name='for_count'""") row = cursor.fetchone() return row[0] class Sample(models.Model): .... objects = CustomManager() 

The above example is for postgresql, but the same can be used for mysql or sql server just by changing the query to one of the above.

Prometheus

How to connect this to django prometheus? I leave it as an exercise.

+7
source

A custom collector that returns the previous value if it is not too old and chooses otherwise will be the way to go. I would save all this in the process.

If you use MySQL, you can look at the collectors that mysqld_exporter offers, since there is a table size that should be cheaper.

+1
source

Source: https://habr.com/ru/post/1013640/


All Articles