MYSQL - Indexing and Optimizing Query Selection

I have a table of over 5 million rows. When I execute the select query, it takes about 20 seconds.

SELECT CompUID,Weburl FROM `CompanyTable` WHERE (Alias1='match1' AND Alias2='match2' )OR Alias3='match3' OR Alias4='match4' 

Here is the table structure:

 CREATE TABLE `CompanyMaster` ( `CompUID` int(11) NOT NULL AUTO_INCREMENT, `Weburl` varchar(150) DEFAULT NULL, `CompanyName` varchar(200) DEFAULT NULL, `Alias1` varchar(150) DEFAULT NULL, `Alias2` varchar(150) DEFAULT NULL, `Alias3` varchar(150) DEFAULT NULL, `Alias4` varchar(150) DEFAULT NULL, `Created` datetime DEFAULT NULL, `LastModified` datetime DEFAULT NULL, PRIMARY KEY (`CompUID`), KEY `Alias` (`Alias1`,`Alias2`,`Alias3`,`Alias4`) ) ENGINE=InnoDB AUTO_INCREMENT=5457968 DEFAULT CHARSET=latin1 

Here is the EXPLAIN from this query:

 --------+------------------------------------------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+------+---------+------+---------+----------------------+ | 1 | SIMPLE | CompanyTable | ALL | Alias | NULL | NULL | NULL | 5255929 | Using where | +----+-------------+----------+-------+---------------+------+---------+------+---------+----------------------+ 

I used the composite Alias index ( Alias1 , Alias2 , Alias3 , Alias4 ). But I think this is not the best. Please suggest me the correct indexing for this selection request request.

+5
source share
3 answers

For the query engine to use a column in a composite index, the columns on the left must first be satisfied. That is, columns should be used as constraints that reduce candidate rows as read from left to right.

The suggestions of OR alias3 (or alias4) violate this rule because it says: "I don’t care what the left parts are (alias1 or alias2 (or alias3)), because I am not dependent on them."

A full table scan is then required to see if there are any alias3 (or alias4) values ​​matching the conditions.

Potentially useful indices in this condition would be:

  • INDEX (alias1, alias2): alias1 and alias2 covers this composite index
  • INDEX (alias3)
  • INDEX (alias4)

Actual statistics and plan selection require further study - but at least now the query planner should work.


That said - and I'm not sure what the role of an "alias" is - it might be prudent to normalize the table. The following changes the semantics a bit, as it leaves the “alias position” (which can be added back) and should be checked for semantic correctness.

 CREATE TABLE `CompanyMaster` ( `CompUID` int(11) NOT NULL AUTO_INCREMENT ,`CompanyName` varchar(200) DEFAULT NULL ,PRIMARY KEY (`CompUID`) ) -- (This establishes a unique alias-per-company, which may be incorrect.) CREATE TABLE `CompaniesAliases` ( `CompUID` int(11) NOT NULL ,`Alias` varchar(150) NOT NULL -- Both CompUID and Alias appear in 'first' positions: -- CompUID for Join, Alias for filter ,PRIMARY KEY (`CompUID`, `Alias`) ,KEY (`Alias`) -- Alternative, which may change plan selection by eliminating options: -- ,PRIMARY KEY (`Alias`, `CompUID`) -- and no single KEY/index on Alias or CompUID ,FOREIGN KEY(CompUID) REFERENCES CompanyMaster(CompUID) ) 

Then it can be requested in about the same way as the original, characterized in that it does not matter which “nickname” corresponds to this value:

 -- AND constructed by joins (could also use GROUP BY .. HAVING COUNT) SELECT c.CompUID FROM `CompanyTable` c JOIN `CompaniesAliases` ac1 ON ac1.CompUID = c.CompUID AND Alias = 'match1' JOIN `CompaniesAliases` ac2 ON ac2.CompUID = c.CompUID AND Alias = 'match2' -- OR constructed by union(s) UNION SELECT c.CompUID FROM `CompanyTable` c JOIN `CompaniesAliases` ac1 ON ac1.CompUID = c.CompUID AND (Alias = 'match3' OR Alias = 'match4') 

I would expect such a query to be effectively implemented in SQL Server - YMMV with MySQL.

+3
source

I would suggest the following solution by creating a table with complex_alias_field. This slightly increases your data, and your data is now redundant, but I find this a simple simple solution.

1. Create a table

 CREATE TABLE `CompanyMaster` ( `CompUID` int(11) NOT NULL AUTO_INCREMENT, `Weburl` varchar(150) DEFAULT NULL, `CompanyName` varchar(200) DEFAULT NULL, `Alias1` varchar(150) DEFAULT NULL, `Alias2` varchar(150) DEFAULT NULL, `Alias3` varchar(150) DEFAULT NULL, `Alias4` varchar(150) DEFAULT NULL, `Created` datetime DEFAULT NULL, `LastModified` datetime DEFAULT NULL, `ComplexAliasQuery` BOOLEAN DEFAULT FALSE, PRIMARY KEY (`CompUID`), KEY `Alias` (`Alias1`,`Alias2`,`Alias3`,`Alias4`), KEY `AliasQuery` (`ComplexAliasQuery`) ) ENGINE=InnoDB AUTO_INCREMENT=5457968 DEFAULT CHARSET=latin1; 

2. Complete your new Field ComplexAliasQuery

 UPDATE CompanyMaster set ComplexAliasQuery = TRUE WHERE (Alias1='match1' AND Alias2='match2' )OR Alias3='match3' OR Alias4='match4'; 

3. To update one of the fields Alias1, Alias2, Alias3, Alias4

To update, simply fill out ComplexAliasQuery. You can do this, perhaps with Trigger http://dev.mysql.com/doc/refman/5.7/en/trigger-syntax.html or in your code if you cannot use a trigger because you are using a cluster .

4. Your simple request is at the end

 SELECT CompUID,Weburl FROM `CompanyMaster` WHERE ComplexAliasQuery IS TRUE; 

with clicking on the index

 +----+-------------+---------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | CompanyMaster | ALL | NULL | NULL | NULL | NULL | 1 | Using where | +----+-------------+---------------+------+---------------+------+---------+------+------+-------------+ 

Another solution

If you do not like the field in your CompanyMaster table, you can transfer it to a new table and name it IndexAliasCompanyMaster, and then simply join this table.

0
source

None of the above. Reconfigure the circuit.

If 4 Aliases are just synonyms for the company, do not break them in the table, move them to another table. (user2864740 got half way, I say to go all the way.)

 CREATE TABLE `CompanyMaster` ( `CompUID` int(11) NOT NULL AUTO_INCREMENT, `Weburl` varchar(150) DEFAULT NULL, `CompanyName` varchar(200) DEFAULT NULL, `Created` datetime DEFAULT NULL, `LastModified` datetime DEFAULT NULL, PRIMARY KEY (`CompUID`), ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CREATE TABLE `CompaniesAliases` ( `CompUID` int(11) NOT NULL, `Alias` varchar(150) NOT NULL, PRIMARY KEY (Alias) -- Assuming no two companies can have the same Alias KEY (CompUID) ) ENGINE=InnoDB; 

(You really need to convert all tables to InnoDB.)

Now your original request will become

 SELECT CompUID, Weburl FROM `CompanyTable` JOIN CompaniesAliases USING(CompUID) WHERE Alias IN ('match1', 'match2', 'match3', 'match4'); 

and it will work much faster.

If you need to specify the company name and its aliases, consider

 SELECT CompanyName, GROUP_CONCAT(Alias) AS 'Also known as' FROM `CompanyTable` JOIN CompaniesAliases USING(CompUID) WHERE ... GROUP BY CompUID; 
0
source

Source: https://habr.com/ru/post/1244272/


All Articles