Efficient MySQL partitioning scheme for a huge data set (7,300,000,000 rows and approximately 80 GB of data)

This is a continuation of my question, "Effectively storing 7.300.000.000 lines" ( Effectively storing 7.300.000.000 lines ).

I decided to use MySQL with partitioning, and the preliminary schema looks like this:

CREATE TABLE entity_values (
  entity_id MEDIUMINT UNSIGNED DEFAULT 0 NOT NULL, # 3 bytes = [0 .. 16.777.215]
  date_id SMALLINT UNSIGNED DEFAULT 0 NOT NULL, # 2 bytes = [0 .. 65.535]
  value_1 MEDIUMINT UNSIGNED DEFAULT 0 NOT NULL, # 3 bytes = [0 .. 16.777.215]
  value_2 MEDIUMINT UNSIGNED DEFAULT 0 NOT NULL, # 3 bytes = [0 .. 16.777.215]
  UNIQUE KEY (entity_id, date_id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PARTITION BY HASH(entity_id) PARTITIONS 25;

This gives:

  • Lines = 7.300.000.000 lines (in accordance with the requirements set forth in the previous message).
  • Size / line = 11 bytes (3 + 2 + 3 + 3)
  • Total size = 7,300,000,000 lines * 11 bytes = 80.300.000.000 bytes = 80.3 GB
  • Partitions = 25 (3.2 GB / partition, partial file size somewhat arbitrary)

Note that I excluded the primary key from the original design, as the id column will not be used.

- , , - /, ? "", , MySQL?

: , 8.570.532 200 000 000 , 24,7 .

: , , entity_id + date_id, , entity_id.

+3
3

, , - , . 2M , , . - ( ).

, , , ( , ), , .

hash entity_id , , , , .

MyISAM " ", , concurrency ; " ", , .

, .

, 80G , InnoDB .

, InnoDB, entity_id, date_id, , entity_id. , date_id, .

, , !

+2

( ) , (entity_id, date_id) - .

, , . , . SELECT..ORDER BY DATE, , MySQL 3650 ( ). - .

, INSERT , , (, ), . , , . (RAID0 ).

, , , INSERT. MySQL ALTER TABLE.. ORDER BY... . 182M, ALTER TABLE.. ORDER BY, 2 , .

!

, , - , - , , . , MySQL, . .

. , . 2 , 1 , , , . , , , RAID0 , .

, .

parallelism ( , ) , /, . 2%, , , , ( -, ).

, , MySQL... , . ( "" ), Disk I/O, , . !

+2

, entity_id; , , ( ). , . . , .

0

Source: https://habr.com/ru/post/1705091/


All Articles