The new project that we are working requires a lot of data analysis, but we think it is very slow, we are looking for ways to change our approach using software and equipment.
We are currently running an instance of amazon ec2 (linux):
High-CPU Extra Large Instance
7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2133.408
cache size : 4096 KB
MemTotal: 7347752 kB
MemFree: 728860 kB
Buffers: 40196 kB
Cached: 2833572 kB
SwapCached: 0 kB
Active: 5693656 kB
Inactive: 456904 kB
SwapTotal: 0 kB
SwapFree: 0 kB
One part of db is articles and objects and a link table, for example:
mysql> DESCRIBE articles_entities;
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id | char(36) | NO | PRI | NULL | |
| article_id | char(36) | NO | MUL | NULL | |
| entity_id | char(36) | NO | MUL | NULL | |
| created | datetime | YES | | NULL | |
| modified | datetime | YES | | NULL | |
| relevance | decimal(5,4) | YES | MUL | NULL | |
| analysers | text | YES | | NULL | |
| anchor | varchar(255) | NO | | NULL | |
+------------+--------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
As you can see from the table below, we have many benefits that grow at a rate of 100,000+ per day.
mysql> SELECT count(*) FROM articles_entities;
+----------+
| count(*) |
+----------+
| 2829138 |
+----------+
1 row in set (0.00 sec)
A simple query like the one below takes too long (12 seconds)
mysql> SELECT count(*) FROM articles_entities WHERE relevance <= .4 AND relevance > 0;
+----------+
| count(*) |
+----------+
| 357190 |
+----------+
1 row in set (11.95 sec)
What should we consider to improve search time? Different database storage? Various hardware.
source
share