Sqlite adding linear data degradation, is it solvable?

I have a test case for writing rows to a database. Each transaction inserts 10,000 rows, without updates. Each step takes linear time longer than the last. The first ten steps required the following amount of time in ms to commit

568, 772, 942, 1247, 1717, 1906, 2268, 2797, 2922, 3816, 3945

By the time it reaches the addition of 10.00 rows to a table of 500,000 rows, it takes 37,149 ms to commit.

  • I have no foreign key restrictions.

  • I found using WAL, improves performance (gives the numbers above), but still linear degradation

  • PRAGMA Synchronous = OFF has no effect

  • PRAGMA locking_mode = EXCLUSIVE has no effect

  • Ran without additional indexes and additional indexes. Made an approximately constant time difference, so there was still linear degradation.

I have some other settings

  • setAutocommit (false)
  • PRAGMA page_size = 4096
  • PRAGMA journal_size_limit = 104857600
  • PRAGMA count_changes = OFF
  • PRAGMA cache_size = 10000
  • The circuit has an Id INTEGER PRIMARY KEY ASC whose insert is incremental and generated by Sqlite

The full scheme is as follows (I ran both with indexes and without them, but turned it on)

create table if not exists [EventLog] ( Id INTEGER PRIMARY KEY ASC, DocumentId TEXT NOT NULL, Event TEXT NOT NULL, Content TEXT NOT NULL, TransactionId TEXT NOT NULL, Date INTEGER NOT NULL, User TEXT NOT NULL) create index if not exists DocumentId ON EventLog (DocumentId) create index if not exists TransactionId ON EventLog (TransactionId) create index if not exists Date ON EventLog (Date) 

This uses sqlite-jdbc-3.7.2 running on a Windows environment

+4
source share
2 answers

SQLite tables and indexes are internally organized as B-Trees . In tables, Rowid is the sort key. (Your INTEGER PRIMARY KEY is Rowid.)


If your inserted identifiers are no larger than the largest identifier already in the table, then records are not added, but inserted somewhere in the middle of the tree. When inserting enough records in one transaction, and if the distribution of identifiers is random, this means that almost every page in the database should be rewritten.

To avoid this,

  • Insert identifiers in ascending order or
  • insert identifiers as NULL so that SQLite selects the next value; or
  • prevent SQLite from using the ID field in Rowid, declaring it as INTEGER UNIQUE (or just INTEGER if you don't need an extra check / index), which makes the table ordered regardless of your identifier.

In the case of indexes, inserting an indexed field with a random distribution requires the index to be updated at an arbitrary position. As with tables, when you insert enough records into one transaction, this means that almost every page in the index needs to be rewritten.

When you download large amounts of data, it is recommended that you do this without any indexes and subsequently recreate them. (Unlike some other databases, SQLite does not have a function to temporarily disable indexes, just drop them.)

+2
source

FYI, although I did not limit the structure in terms of key content, in 99.999% of cases it will be a guide. Therefore, to solve the performance problem, I just wrote an algorithm to generate sequential pointers using the time value for the first 8 hexadecimal digits. This worked very well, even if pointer blocks are generated using early time values.

0
source

Source: https://habr.com/ru/post/1437251/


All Articles