We are introducing a system that analyzes books. The system is written in PHP, and for each book, a cycle is run through the words and each of them is analyzed, setting certain flags (which translate database fields) from various regular expressions and other tests.
The result is a matches table, similar to the example below:
+ ------------------------ + -------------- + ------ + - --- + --------- + ---------------- +
| Field | Type | Null | Key | Default | Extra |
+ ------------------------ + -------------- + ------ + - --- + --------- + ---------------- +
| id | bigint (20) | NO | PRI | NULL | auto_increment |
| regex | varchar (250) | YES | | NULL | |
| description | varchar (250) | NO | | NULL | |
| phonic_description | varchar (255) | NO | | NULL | |
| is_high_frequency | tinyint (1) | NO | | NULL | |
| is_readable | tinyint (1) | NO | | NULL | |
| book_id | bigint (20) | YES | | NULL | |
| matched_regex | varchar (255) | YES | | NULL | |
| [...] | | | | | |
+ ------------------------ + -------------- + ------ + - --- + --------- + ---------------- +
Most missed tinyint fields are either 0 or 1. There are 25 fields in the match table.
There are ~ 2,000,000 rows in the match table; the result of the analysis is ~ 500 books.
Currently, there is a site reporting area that queries the matches table as follows:
SELECT COUNT(*) FROM matches WHERE is_readable = 1 AND other_flag = 0 AND another_flag = 1
However, it currently takes more than a minute to get the main report of the index, since each request takes about 0.7 seconds. I cache this at the request level, but it still takes too much time to load the page.
How am I not very good at how to manage data sets, for example, can someone advise me on a better way to store or query this data? Are there any optimizations that I can use with MySQL to improve the performance of these COUNT s, or am I better off using a different database or data structure?
We are currently using MySQL with MyISAM and VPS tables, so there can be no question of switching to a new database system.