Database Selection: High Write, Low Read

I am creating a component for recording historical data. Initially, I expect it to do about 30 records per second and less than 1 time per second.

Data will never be changed, only new data will be added. Reading is likely to be done with new entries.

Demand is likely to grow rapidly, expecting around 80 records per second per year.

I could select the distribution of my component and use a shared database such as MySql, or I could go with a distributed database such as MongoDb. In any case, I would like the database to process records very well.

The database should be free. Open source will be a plus :-)

Note. A record is plain text with a variable size, usually between 50 and 500 words.

+6
source share
2 answers

Your question can be resolved in several different ways, so let me break it down and look at the individual requirements that you have laid out:

  • Writes - It seems that the main part of what you are doing is adding only relatively low volume entries (80 entries per second). Almost any product on the market with a reasonable database will be able to handle this. You are viewing 50-500 "words" of stored data. I'm not sure what a word is, but for the sake of argument, let's say that the word averages 8 characters, so your data will be some kind of metadata, a key / timestamp / something plus 400-4000 bytes of words. The ban on the implementation of specific details of various RDBMS, this is still pretty normal, we probably write a maximum (including overhead) of 4100 bytes per record. This reaches 328,000 bytes per second or, as I like, not a lot of writing.

  • Deletes. You also need the ability to delete your data. I can’t say that. Deletions are deleted.

  • Reading is where everything gets complicated. You note that mostly primary keys and reads are performed on new data. I'm not sure what that means, but I don't think it matters. If you do key searches (like I want to record 8675309) then life is good and you can use anything.

  • Connections. If you need the ability to write the actual connections where the database processes them, you yourself have created yourself from the main products that are not related to the relational database.

  • Data size / data life - everything gets fun here. You rated your records at a speed of 80 seconds, and I think it is 4,100 bytes per record or 328,000 bytes per second. There are 86,400 seconds per day, which gives us 28,339,200,000 bytes. Terrifying! This is 3.351.269.53125 KB, 27.026 MB, or approximately 26 GB per day. Even if you save your data for 1 year, it is 9633 GB or 10 TB of data. You can rent 1 TB of data from a cloud hosting provider for about US $ 250 per month or buy it from a SAN provider such as EqualLogic for $ 15,000.

Conclusion: I can only think of a few databases that could not handle this load. 10TB is getting a little more complicated and requires a bit of administrative skills, and you may need to learn some data lifecycle management techniques, but almost any DBMS should fit this task. Similarly, almost any non-relational / NoSQL database should fit this task. In fact, almost any database of any type should fit the task.

If you (or your team members) already have skills in a particular product, just stick to it. If you have a specific product that surpasses your problem domain, use it.

This is not the type of problem that requires any kind of distributed magical unicorn.

+8
source

Good for MySQL. I would advise you to use InnoDB without any indexes, waiting on primary keys, even then if you can skip them, it would be nice to keep the input stream uninterrupted.

Indexes optimize reading but decrease write capabilities.

You can also use PostgreSQL. Where you also need to skip indexes, but you will not have a choice of engine, and its capabilities are also very strong for writing.

This approach you want is actually used in some solutions, but with two db servers or at least two databases. The first gets a lot of new data (your case), and the second contacts the first and stores it in a well-structured database (with indexes, rules, etc.). And then, when you need to read or take a snapshot of the data you are referring to, a second server (or a second database) where you can use transactions, etc.

You should take a look and turn to Oracle Express (I think that was his name) and SQL Server Express Edition. The latter two have better performance, but also some limitations. To get a more detailed image.

-1
source

Source: https://habr.com/ru/post/892583/


All Articles