Massive DB and mysql

Question

Massive DB and mysql

The new project that we are working requires a lot of data analysis, but we think it is very slow, we are looking for ways to change our approach using software and equipment.

We are currently running an instance of amazon ec2 (linux):

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge


processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
stepping        : 5
cpu MHz         : 2133.408
cache size      : 4096 KB

MemTotal:      7347752 kB
MemFree:        728860 kB
Buffers:         40196 kB
Cached:        2833572 kB
SwapCached:          0 kB
Active:        5693656 kB
Inactive:       456904 kB
SwapTotal:           0 kB
SwapFree:            0 kB

One part of db is articles and objects and a link table, for example:

mysql> DESCRIBE articles_entities;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | char(36)     | NO   | PRI | NULL    |       | 
| article_id | char(36)     | NO   | MUL | NULL    |       | 
| entity_id  | char(36)     | NO   | MUL | NULL    |       | 
| created    | datetime     | YES  |     | NULL    |       | 
| modified   | datetime     | YES  |     | NULL    |       | 
| relevance  | decimal(5,4) | YES  | MUL | NULL    |       | 
| analysers  | text         | YES  |     | NULL    |       | 
| anchor     | varchar(255) | NO   |     | NULL    |       | 
+------------+--------------+------+-----+---------+-------+
8 rows in set (0.00 sec)

As you can see from the table below, we have many benefits that grow at a rate of 100,000+ per day.

mysql> SELECT count(*) FROM articles_entities;
+----------+
| count(*) |
+----------+
|  2829138 | 
+----------+
1 row in set (0.00 sec)

A simple query like the one below takes too long (12 seconds)

mysql> SELECT count(*) FROM articles_entities WHERE relevance <= .4 AND relevance > 0;
+----------+
| count(*) |
+----------+
|   357190 | 
+----------+
1 row in set (11.95 sec)

What should we consider to improve search time? Different database storage? Various hardware.

+3

optimization mysql nosql database-design

Lizard Jan 20 '11 at 11:32

source share

3 answers

YoGiN · Answer 1 · 2011-01-20T13:08:17+0000

mrorigo, SHOW CREATE TABLE articles_entities, .

MySQL http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. 
For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL cannot use an index if the columns do not form a leftmost prefix of the index

, relevance , , .

, .

origo · Answer 2 · 2011-01-20T12:23:37+0000

char (36) , MySQL. INT- , . CHAR, INT (BIG) INT ( "" ).

, , CHAR ( , VARCHAR, ).

, SHOW CREATE TABLE , /, , , EXPLAIN .

PS. SHOW TABLE STATUS LIKE '{table_name}', ( ) .

Pixy misa · Answer 3 · 2011-01-20T12:10:01+0000

, , :

. . .

The first thing to do is check your indexes. Do EXPLAIN in your queries to find out how MySQL handles them.

If that looks reasonable, the next thing would be to check the memory. How big is your shared database? Memory is cheap these days, and queries that run from memory will be much faster than queries that need to be read from disk.

After you have studied them, if the performance is still slow, it may be time to consider other options.

Massive DB and mysql

More articles: