MySQL Partitioning / Sharding / Splitting - in which direction?

We have an InnoDB database, which is about 70 GB, and we expect it to grow to several hundred GB in the next 2-3 years. About 60% of the data refer to one table. Currently, the database works quite well, since we have a server with 64 GB of RAM, so almost the entire database fits into memory, but was worried about the future when the amount of data will be much larger. Some method of partitioning tables was being considered right now (especially the one that accounts for most of the data), and Im is now wondering what would be the best way to do this.

The options that I now know about

  • Using MySQL Partitioning, which ships with version 5.1
  • Using any third-party library that encapsulates data sharing (e.g., hibernation)
  • Implementing this in our application

Our application is built on J2EE and EJB 2.1 (we hope that one day we will switch to EJB 3).

What would you suggest?

EDIT (2011-02-11):
Just an update: currently the database size is 380 GB, the data size of our "large" table is 220 GB, and its index size is 36 GB. Thus, while the entire table no longer fits into memory, the index does.
The system is still working fine (still on the same hardware), and we are still thinking about sharing data.

EDIT (2014-06-04): Another update: the size of the entire database is 1.5 TB, the size of our "large" table is 1.1 TB. We upgraded our server to a 4-processor machine (Intel Xeon E7450) with 128 GB of RAM. The system is still working fine. What we plan to do next is to place our large table on a separate database server (we have already made the necessary changes to our software), while updating new equipment with 256 GB of RAM.

This installation is designed for two years. Then we need to either finally start implementing the scalding solution, or just buy servers with 1 TB of RAM, which should hold us for some time.

EDIT (2016-01-18):

Since then, we have placed our large table in our own database on a separate server. Currently, the size of this database is about 1.9 TB, the size of the other database (with all tables except the "large") is 1.1 TB.

Current hardware setup:

  • HP ProLiant DL 580
  • 4 x Intel (R) Xeon (R) CPU E7- 4830
  • 256 GB RAM

In this setup, performance is great.

+47
mysql partitioning sharding database-performance
Sep 05 '08 at 13:59
source share
9 answers

If you think that you are going to use IO / memory, I do not think that separation will be useful. As usual, benchmarking first helps you figure out the best direction. If you do not have spare servers with 64 gigabyte memory, you can always ask your supplier for a "demo unit."

I would lean towards sharding if you do not expect 1 aggregate reporting on request. I assume that you have tricked the whole database, not just your large table: it is best to combine all entities. Well, if your model breaks fine, anyway.

+10
Sep 05 '08 at 15:00
source share

You will certainly begin to encounter problems on this 42GB table if it no longer fits in memory. In fact, as soon as it no longer fits into memory, performance will decline very quickly. One way to test is to place this table on another machine with less RAM and see how bad it is.

First of all, it doesn't really matter if you move some tables to a separate physical volume.

This is not true. Partioning (either through a function in MySQL 5.1, or the same using MERGE tables) can provide significant performance benefits, even if the tables are on the same drive.

As an example, let's say that you are executing SELECT queries in your large table using a date range. If the table is whole, the query will be forced to scan the entire table (and the size, even if the indexes can be slow). The advantage of splitting is that your queries will only be executed on partitions where it is absolutely necessary. If each partition is 1 GB in size, and your request only needs access to 5 partitions to execute it, a joined table with 5 GB is much easier for MySQL than the version with a 42 GB monster.

One thing you should ask yourself about is how you request data. If it is likely that your queries will only need access to certain pieces of data (for example, a date range or a range of identifiers), some sort of partition will be useful.

I heard that there is still some error when splitting into MySQL 5.1, especially in connection with choosing the correct MySQL key. MERGE tables can provide the same functionality, although they are slightly more expensive.

Hope this helps ... good luck!

+25
Sep 25 '08 at 13:58
source share

This is a great example of what the MySql partition can do in a real example of huge data streams:

http://web.archive.org/web/20101125025320/http://www.tritux.com/blog/2010/11/19/partitioning-mysql-database-with-high-load-solutions/11/1

Hoping this will be useful for your business.

+6
Nov 21 2018-10-21
source share

A, when I returned to the Microsoft ArcReady event, I saw a presentation about scalable templates that may be useful to you. You can view slides for him on the Internet.

+1
Sep 05 '08 at 14:33
source share

I would go to the MariaDB InnoDB + sections (either by keyword or by date, depending on your requests).

I did this and now I no longer have database problems.

MySQL can be replaced with MariaDB in seconds ... all database files remain unchanged.

+1
Oct 11 '11 at 10:30
source share

First of all, it does not matter, since tables are split if you do not move some tables to a separate physical volume.

Secondly, this is not necessarily the table with the largest physical size that you want to move. You can have a much smaller table that gets more activity, while your large table stays pretty constant or just adds data.

Whatever you do, do not implement it yourself. Let the database system handle it.

0
Sep 05 '08 at 14:15
source share

What makes a large table.

If you are going to break it, you have several options:
- Separate it using the database system (little is known about it)
- Divide it by line.
- divide it by column.

Separation by lines is only possible if your data can be easily divided into pieces. for example, Something like Basecamp has several accounts that are completely separate. You can save 50% of accounts in one table and 50% in another table on another machine.

Column separation is useful for situations where the row size contains large text fields or BLOBS. If you have a table with (for example) an image of the user and a huge block of text, you can compile the image into a completely different table. (on another machine)

Here you are breaking normalization, but I don’t think it will cause too many problems.

0
Sep 05 '08 at 14:35
source share

As usual, benchmarking first helps you determine the best direction.

This is what most people tell me, so I think I will finally have to take this pill ...

0
08 sept. '08 at 18:25
source share

You probably want to split this large table. You probably want to put it on a separate hard drive before thinking about a second server. Doing this with MySQL is the most convenient option. If he is capable, then go for it.

BUT

It all depends on how your database is used. Statistics.

0
Sep 22 '08 at 20:59
source share



All Articles