MySql - table size and performance

We have an Analytics product. For each of our clients, we give one JavaScript code that they post on their sites. If a user visits our client site, the java script code will go to our server, so we store this page on behalf of this client. Each client has a unique domain name.

we store this page in the MySql table.

The table below is a diagram.

CREATE TABLE `page_visits` (
  `domain` varchar(50) DEFAULT NULL,
  `guid` varchar(100) DEFAULT NULL,
  `sid` varchar(100) DEFAULT NULL,
  `url` varchar(2500) DEFAULT NULL,
  `ip` varchar(20) DEFAULT NULL,
  `is_new` varchar(20) DEFAULT NULL,
  `ref` varchar(2500) DEFAULT NULL,
  `user_agent` varchar(255) DEFAULT NULL,
  `stats_time` datetime DEFAULT NULL,
  `country` varchar(50) DEFAULT NULL,
  `region` varchar(50) DEFAULT NULL,
  `city` varchar(50) DEFAULT NULL,
  `city_lat_long` varchar(50) DEFAULT NULL,
  `email` varchar(100) DEFAULT NULL,
  KEY `sid_index` (`sid`) USING BTREE,
  KEY `domain_index` (`domain`),
  KEY `email_index` (`email`),
  KEY `stats_time_index` (`stats_time`),
  KEY `domain_statstime` (`domain`,`stats_time`),
  KEY `domain_email` (`domain`,`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

We do not have a primary key for this table.

MySql Server Information

This is the Google MySql cloud (version 5.6) and its capacity is 10 TB.

We currently have 350 million rows in our table, and the table size is 300 GB. We store all the data of our customers in one table, even if there is no connection between one client.

1. , , .

1:

SELECT count(DISTINCT sid) AS count,count(sid) AS total FROM page_views WHERE domain = 'aaa' AND stats_time BETWEEN CONVERT_TZ('2015-02-05 00:00:00','+05:30','+00:00') AND CONVERT_TZ('2016-01-01 23:59:59','+05:30','+00:00');
+---------+---------+
| count   | total   |
+---------+---------+
| 1056546 | 2713729 |
+---------+---------+
1 row in set (13 min 19.71 sec)

. 5-10 , ?

2: , 5 , . , . .

, , .

1) , . 30 . , 100 ., 100 . . .

2)

. "" ? :

+4
2

-, , :

  `domain` varchar(50) DEFAULT NULL,  -- normalize to MEDIUMINT UNSIGNED (3 bytes)
  `guid` varchar(100) DEFAULT NULL,  -- what is this for?
  `sid` varchar(100) DEFAULT NULL,  -- varchar?
  `url` varchar(2500) DEFAULT NULL,
  `ip` varchar(20) DEFAULT NULL,  -- too big for IPv4, too small for IPv6; see below
  `is_new` varchar(20) DEFAULT NULL,  -- flag?  Consider `TINYINT` or `ENUM`
  `ref` varchar(2500) DEFAULT NULL,
  `user_agent` varchar(255) DEFAULT NULL,  -- normalize! (add new rows as new agents are created)
  `stats_time` datetime DEFAULT NULL,
  `country` varchar(50) DEFAULT NULL,  -- use standard 2-letter code (see below)
  `region` varchar(50) DEFAULT NULL,  -- see below
  `city` varchar(50) DEFAULT NULL,  -- see below
  `city_lat_long` varchar(50) DEFAULT NULL,  -- unusable in current format; toss?
  `email` varchar(100) DEFAULT NULL,

IP- inet6_aton(), BINARY(16).

country CHAR(2) CHARACTER SET ascii - 2 .

+ + + () latlng - "".

. → → - → .

...

sid,

KEY `domain_statstime` (`domain`,`stats_time`),

KEY dss (domain_id,`stats_time`, sid),

" ", 2713729 - - , 13 . (domain_id .)

DROP it: KEY domain_index (domain)

"" domain?

InnoDB PRIMARY KEY. 3 ; "" - 6- , . , "" , ? BIGINT UNSIGNED. (, 8 , PK.)

WHERE domain = '...', . ( .)

id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
domain_id MEDIUMINT UNSIGNED NOT NULL,   -- normalized to `Domains`
PRIMARY KEY(domain_id, id),  -- clustering on customer gives you the speedup
INDEX(id)  -- this keeps AUTO_INCREMENT happy

pt-online-schema-change . , PRIMARY KEY.

" "? . ; № . 100K.

Sharding

"Sharding" - .

sharding, -, domain, , , . Sharding . , , .

- domain ( domain_id), (1) , (2) ( 100 . ) (3) .

, , 1024 , 1024 , , . , , . , .

"" , . .

PARTITIONing - "".

, - . , . , , .

" ". ? , . (. .) :

A: PRIMARY KEY(domain_id, stats_time, id) . ( PK.)

B: stats_time , , . stats_time id. , . ( ).

C: , stats_time . , WHERE id BETWEEN ... AND stats_time .... ( , .)

? , , , . .

COUNT(DISTINCT sid) . , , . .

+1

, . , , - , , (, ); siteid:date redis (). count sql - .

0

Source: https://habr.com/ru/post/1623731/


All Articles