How to speed up this MySQL query?

geneHomology
============
id genome_name  gene_id  homolog_genome_name  homolog_gene_id consider_homolog
1  HomoSap      1007     MusMus               824             1
2  HomoSap      1007     MusMus               825             1
3  HomoSap      1007     MusMus               826             1
4  HomoSap      2890     EColi                2140            1
...

gene
====
genome_name  gene_id  gene_category
MusMus       823      Upregulated
MusMus       824      Downregulated
MusMus       825      Normal
MusMus       826      Normal
MusMus       827      Upregulated
EColi        2140     Normal
...

consider_homologis an enumeration (0,1). genome_nameand gene_idare the primary keys in gene. geneHomologyvery large - about 200 M lines.

My goal is to count for each gene the genesnumber of homologs that it has from each gene_category.

For example, following the above data, it HomoSap 1007has 3 Normalhomologs and 1 Downregulated.

So my query is:

SELECT a.id,a.genome_name,a.gene_id,a.homolog_genome_name,a.homolog_gene_id,COUNT(b.gene_category)
FROM geneHomology a,gene b 
WHERE a.consider_homolog='1' AND a.homolog_genome_name=b.genome_name AND a.homolog_gene_id=b.gene_id 
GROUP BY a.genome_name,a.gene_id,b.gene_category;

He never returns (and I patiently waited more than an hour).

I already indexed gene_categoryin gene.

I'm really new to MySQL, but I have root access to the database, so I could follow your suggestions (carefully ...). I would be happy to provide any further information.

UPDATE This is the output EXPLAINfor the query:

+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys         | key                  | key_len | ref                                                      | rows    | Extra                           |
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
|  1 | SIMPLE      | b     | ALL  | PRIMARY,gene_genome   | NULL                 | NULL    | NULL                                                     | 1560695 | Using temporary; Using filesort | 
|  1 | SIMPLE      | a     | ref  | geneHomologyHit_gene  | geneHomologyHit_gene | 54      | my_db_v71.b.gene_id,my_db_v71.b.genome_name              |      13 | Using where                     | 
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+

2

mysql> SHOW INDEX FROM gene;
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    | Table | Non_unique | Key_name                 | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    | gene  |          0 | PRIMARY                  |            1 | gene_id             | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          0 | PRIMARY                  |            2 | genome_name         | A         |     1560695 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_organism            |            1 | taxon_id            | A         |         392 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_genome              |            1 | genome_name         | A         |         853 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_gene_category       |            1 | gene_category       | A         |           5 |     NULL | NULL   |      | BTREE      |         | 
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    5 rows in set (0.01 sec)

3

mysql> SHOW INDEX FROM geneHomology;
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table        | Non_unique | Key_name               | Seq_in_index | Column_name              | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| geneHomology |          0 | PRIMARY                |            1 | id                       | A         |   680326661 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyQuery_gene |            1 | gene_id                  | A         |     1498516 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyQuery_gene |            2 | genome_name              | A         |     1505147 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyHit_gene   |            1 | homolog_gene_id          | A         |    52332820 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyHit_gene   |            2 | homolog_genome_name      | A         |    52332820 |     NULL | NULL   |      | BTREE      |         | 
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.00 sec)

4 , , , ? LIMIT 1000 LIMIT 10, .

5

mysql> SHOW CREATE TABLE geneHomology;
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| geneHomology | CREATE TABLE `geneHomology` (
  `id` bigint(20) NOT NULL auto_increment,
  `genome_name` varchar(20) NOT NULL,
  `gene_id` varchar(30) NOT NULL,
  `homolog_genome_name` varchar(20) NOT NULL,
  `homolog_gene_id` varchar(30) NOT NULL,
  `homolog_length` bigint(20) unsigned NOT NULL,
  `significance` double unsigned NOT NULL,
  `bit_score` double unsigned NOT NULL,
  `percent_identity` double unsigned NOT NULL,
  `start_match` int(10) unsigned NOT NULL,
  `end_match` int(10) unsigned NOT NULL,
  `start_match_percent` double unsigned NOT NULL,
  `end_match_percent` double unsigned NOT NULL,
  `strand` enum('+','-') default NULL,
  `homolog_start_match` int(10) unsigned NOT NULL,
  `homolog_end_match` int(10) unsigned NOT NULL,
  `homolog_start_match_percent` double unsigned NOT NULL,
  `homolog_end_match_percent` double unsigned NOT NULL,
  `homolog_strand` enum('+','-') default NULL,
  `consider_gene_homology` enum('0','1') NOT NULL,
  `reason_not_considered` varchar(50) default NULL,
  `num_hsps` int(10) unsigned NOT NULL,
  `homology_type` varchar(2) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `geneHomologygene` (`gene_id`,`genome_name`),
  KEY `geneHomologyhomolog_gene` (`homolog_gene_id`,`homolog_genome_name`)
) ENGINE=MyISAM AUTO_INCREMENT=680326662 DEFAULT CHARSET=latin1 | 
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE gene;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gene  | CREATE TABLE `gene` (
  `taxon_id` int(10) unsigned NOT NULL,
  `genome_name` varchar(20) NOT NULL,
  `gene_id` varchar(30) NOT NULL,
  `symbol` varchar(30) default NULL,
  `type` varchar(30) default NULL,
  `product` varchar(300) default NULL,
  `strand` enum('+','-') NOT NULL,
  `start` bigint(20) unsigned NOT NULL,
  `end` bigint(20) unsigned NOT NULL,
  `gene_category` enum('Upregulated','Downregulated','Normal','n/a') NOT NULL,
  `consider_gene` enum('0','1') NOT NULL,
  `reason_not_considered` varchar(50) default NULL,
  `sequence` longblob NOT NULL,
  `additional_info` varchar(300) default NULL,
  PRIMARY KEY  (`gene_id`,`genome_name`),
  KEY `gene_organism` (`taxon_id`),
  KEY `gene_genome` (`genome_name`),
  KEY `gene_gene_category` (`gene_category`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 | 
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
+3
3
SELECT  a.genome_name, a.gene_id,
        cats.gene_category,
        (
        SELECT  COUNT(*)
        FROM    geneHomology ab
        JOIN    gene b
        ON      b.genome_name = ab.homolog_genome_name
                AND b.gene_id = ab.homolog_gene_id
        WHERE   ab.genome_name = a.genome_name
                AND ab.gene_id = a.gene_id
                AND b.gene_category = cats.gene_category
        ) cx
FROM    gene a
CROSS JOIN
        (
        SELECT  'Normal' AS gene_category
        UNION ALL
        SELECT  'Upregulated' AS gene_category
        UNION ALL
        SELECT  'Downregulated' AS gene_category
        ) cats
LIMIT 100

filesort .

gene_categories, cats.

+1

, , gen_category geneHomolgy. , your_homolog + GROUP BY.

0

genome_name WHERE - gene.gene_id gene.genome_name , , - / mroe , / .

, , geneHomology.hit_gene_id. , , . 54 , , . hit_gene_id think_homolog , , , .

How fast does it take to execute the 'SELECT * FROM gene'? How many entries are in ene_homology?

It seems that gene homology breaks down the N: M link between gene and gene (itself) and applies labels.

If the number of values ​​in homolog_genome_name is relatively small, you can consider decomposing it into a gene using a bitmap field. Or perhaps denormalize the connection into a set of 1: 1 comparisons. Alternatively, you can list the homolog clusters.

0
source

Source: https://habr.com/ru/post/1773604/


All Articles