How to efficiently crop a SQlite database to a given size?

I am using SQLite 3.7.2 for Windows. My database is used to store log data that is generated 24/7. The scheme is mainly:

CREATE TABLE log_message(id INTEGER PRIMARY KEY AUTOINCREMENT, process_id INTEGER, text TEXT); CREATE TABLE process(id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT); 

The log_message.process_id field displays process.id , thereby associating each log message with the process from which it comes.

Now, sooner or later, the database will become too large, and I would like to delete old records (those with the lowest log_message.id values) until the database falls to a given size (for example, 1 GB). For this I am doing now

 PRAGMA page_count; PRAGMA page_size; 

after each of several log messages to get the size of the database. If it exceeds my limit, I just delete the fraction (right now: 100 messages) of the log messages as follows:

 BEGIN TRANSACTION; DELETE FROM log_message WHERE id IN (SELECT id FROM log_message LIMIT 100); DELETE FROM process WHERE id IN (SELECT id FROM PROCESS EXCEPT SELECT process_id FROM log_message); COMMIT; VACUUM; 

The final DELETE removes all unregistered entries from the process table. I repeat this process until the file size is accepted again.

This is due to at least two problems:

  • The deletion approach of 100 log messages is rather random; I made this number based on several experiments. I would like to know the number of entries that I must delete in advance.
  • VACUUM callbacks can take quite a while (the SQLite homepage says that VACUUM can take up to half a second per MB on Linux, I think it won't be faster on Windows).

Does anyone have any other suggestions on how to do this?

+6
source share
5 answers

when you have a โ€œright-sizedโ€ database, then count the number of rows of state_log.

 SELECT COUNT(*) FROM LOG_MESSAGE 

Save this number.

If you want to compress the file, run the count command again. Calculate the difference, delete this number of rows from your database, then VACCUM.

It may be only approximate, but it will bring you closer to 1GB pretty quickly. If you have not finished yet, you can return to the method of 100 lines at a time.

+2
source
 CREATE TABLE log_messages ( integer id primary key, -- no autoincrement here datetime event_time, -- for last id retrieval char(248) message -- fixed field size ) 

Suppose that the whole field is 4 bytes long, the datetime field is also 4 bytes, and each character is one byte. Then each record is 256 bytes long, and your space is 1 KB. 4 entries.

Initialize a table with sequential identifiers

 1 | 2011-05-01 23:00:01 | null 2 | 2011-05-01 23:00:01 | null 3 | 2011-05-01 23:00:01 | null 4 | 2011-05-01 23:00:01 | null 

When you run your program, you run a query such as:

SELECT id FROM log_messages ORDER BY event_time DESC LIMIT 1

The result of this query is 4, now you add 1, since the maximum number of records is also 4, 4 + 1 = 1, so the record identifier needs to be updated.

UPDATE log_message SET message = "new message", event_time = NOW () WHERE id = 1

For the next entry, you simply add 1 to the last identifier that you have in memory.

Hope you get this idea.

+2
source

If you have FS permissions, the best way, I think, would be to create a new db log and apply some kind of rotation to the db files (deleting the oldest ones).

+1
source

Divide the specified maximum file size by page size (as indicated by PRAGMA page_size ) to get the maximum number of pages that the database can allocate. Set this value using PRAGMA max_page_count .

Thus, the INSERT SQLITE_FULL error whenever the maximum size is deleted. Whenever this happens, execute your DELETE procedure to discard the oldest entries. After that, you can resend INSERT until the database is full again. Etc.

This does not crop the database to a given size, but it is still inefficient. Instead, itโ€™s best to overlay the maximum size, which should not exceed, and then save the database file so large that SQlite can reuse the allocated disk space instead of enlarging or shrinking the file.

+1
source

Four years later and probably a short penny, but have you ever thought that โ€œIdโ€ is set to a range in which you reset back to minimum as soon as it reaches maximum, and then instead of inserting and deleting records, doing "updates" database.
I appreciate that you will need to keep the last used "Id" number in case of closing the program so that you can start from the right point on restarting, but that seems relatively trivial.
Configure in such a way as to determine the size of your database, if you used a fixed record size, by the number of records in the "Id" range.

0
source

Source: https://habr.com/ru/post/887990/


All Articles