SQLite: what are the practical limitations?

Before you mark this question as a duplicate, PLEASE HEAR ME! I already read the questions asked here about how to increase productivity, for example. just note a few Improve SQLite performance? and What are sqlite performance characteristics with very large database files?

I'm struggling to get sqlite to work with a 5 gigabyte database file size. On the contrary, there are people who claim that sqlite works “perfectly” for them, even when the database size reaches 160 GB. I have not tried this myself, but from the questions asked, I think that all floor marking, perhaps, is performed only with a table in the database.

I am using a database using - 20 or so tables
- half of the tables contain more than 15 columns
- Each of these 15 or table columns has 6/7 foreign key columns - Several of these tables have already grown to 27 million records per month

The development machine I use is a 3 GHz quad-core machine with 4 gigabytes of RAM, and yet it takes more than 3 minutes to query row_count in these large tables.

I could not find a way to split the data horizontally. The best snapshot I have is to split the data across several database files, one for each table. But in this case, as far as I know, the restrictions on the columns of the foreign key cannot be used, so I will have to create my own table with enough (without any foreign keys).

So my questions are: a) Am I using the wrong database for the job?
b) Where do you think I'm wrong?
c) I haven't added foreign key indexes yet, but if only querying the number of rows takes four minutes, how do foreign key indexes help me?

EDIT Provide more information, although no one requested it :) I am using SQLite version 3.7.9 with system.data.sqlite.dll version 1.0.77.0

EDIT2: I THINK where I will differ from 160 gigabytes, it is that they can choose a single record or a small range of records. But I need to load all 27 million rows in my table, combine them with other tables, group the records at the user's request and return the results. Any input based on the best way to optimize the database for such results.

I can not cache the results of the previous query, as this does not make sense in my case. The chances of getting into the cache will be pretty low.

+6
source share
3 answers

There is much to consider, but my first advice is not to take other performance statistics at face value. Database performance depends on many factors, including how your database is structured, the complexity of your queries, which indexes you defined (or not), and often just the amount of data in them. Many reported performance indicators come from a large number of trial and error and / or database matching with the task. To put it another way, the performance that you are going to get from any DBMS cannot be explicitly compared with other application performance if your datasets and structures are not simple, almost identical - they are certainly a conductor and, possibly, ideal a means to, but you don't necessarily get insane performance out of the box.

I would, as a starting point, start indexing data on these really large tables (looks like from the comments you have) and see what happens. Of course, a count of four minutes is quite lengthy, but do not stop there. Add some indexes, change them, ask if you are saving data that you don’t need to store, and look at other database queries, not just the count query, to evaluate the performance. Look for other blog applications and blog posts that use SQLite for a large number of rows, and see what they did to address it (which may include modifying databases). Basically, try things and then make a judgment. Do not let the initial fear stop you thinking you are going the wrong way. Maybe you may not, but do not stop with the COUNT request. No matter how you cut it, 27 million entries in the table are a shitty ton.

Finally, one specific piece of advice: in SQLite, do not split the database into several files - I don’t see that this helps, because then you will have to make many additional queries to work, and then manually join your individual tables after returning the results from several queries. It invents what the DBMS does for you, and it's a crazy idea. You are not going to somehow determine the way to perform associations faster than the creators of the RDBMS system - you will definitely spend time there.

+4
source

select count (*) in SQLite will always be slower compared to other DMBS because it scans the table for this particular query. It does not have a statistics table to help. This does not mean that your application requests will be slow. You need to check your inquiries to really say what you can expect.

Some general recommendations: indexing is an absolute must, since moving a subset of data in a binary tree is much faster than moving the whole table when using huge sizes. To help load time, you should sort your data for a unique index, and if you do not have a unique index, then the largest index. If you can drop the indexes before loading and return them after, it will be faster. If these methods cannot match your operational parameters and SLA, then it's time to do horizontal partitioning and use "attach" to cover the range of data that you need. SQLite can support up to 10 connections. I know that some people say that separation is the work of the tool, not the developers, but when you are faced with physical limitations, you have to roll up your sleeves or perhaps choose a commercial tool that does this under the cover for you.

0
source

If you have 50 MB or more db that are directly deployed on the client side, it means that you are doing something wrong. Try to go to the server, keeping the key - important on the client. (links only) You won’t have real time, but at least it will work out a suitable solution. "Server side" is the answer to your question, that is, if you drop or optimize the requirements for real-time, because this is what you have (based on your description). Anyway. SQLite can handle almost everything, but from personal experience, just try to simplify the situation as much as possible, even in real-time cost.

0
source

Source: https://habr.com/ru/post/910505/


All Articles