MySQL Syntax and OR Performance

Question

MySQL Syntax and OR Performance

This MySQL query works just fine

SELECT o.id FROM descriptions_programs d, titles_programs t, programs o WHERE (d.object_id=o.id AND MATCH (d.text) AGAINST ('+china' IN BOOLEAN MODE) AND d.current=1) AND (t.object_id=o.id AND MATCH (t.text) AGAINST ('+china' IN BOOLEAN MODE) AND t.current=1)

But if I replace one AND with OR, the request takes a very long time. (I have to kill him.):

 SELECT o.id FROM descriptions_programs d, titles_programs t, programs o WHERE (d.object_id=o.id AND MATCH (d.text) AGAINST ('+china' IN BOOLEAN MODE) AND d.current=1) OR (t.object_id=o.id AND MATCH (t.text) AGAINST ('+china' IN BOOLEAN MODE) AND t.current=1)

Why is this? Do not go in cycles in simplicity + porcelain. I just simplified this for the sake of debugging. Also, if I run only one of the MATCH AGAINST tests, it works fine, so both of them are fine. I get the feeling that I am inadvertently causing a huge pool using USE OR, but I just don't get it. I used to use the n IN test in UNION of two subqueries that worked, but that should work too. Right?

Update: per bobince request. It is not very slow, but in ~ 500 ms it is not as fast as using UNION as a quality is discussed here .

 mysql> explain SELECT o.id -> FROM programs o -> JOIN titles_programs t ON t.object_id=o.id -> JOIN descriptions_programs d ON d.object_id=o.id -> WHERE MATCH (d.text) AGAINST ('+china' IN BOOLEAN MODE) AND d.current=1 -> OR MATCH (t.text) AGAINST ('+china' IN BOOLEAN MODE) AND t.current=1 -> ; +----+-------------+-------+-------+ ----------------+----------------+---------+----------------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+----------------+----------------+---------+----------------------+--------+-------------+ | 1 | SIMPLE | o | index | PRIMARY | PRIMARY | 4 | NULL | 148666 | Using index | | 1 | SIMPLE | d | ref | object_current | object_current | 4 | haystack.o.id | 1 | | | 1 | SIMPLE | t | ref | object_current | object_current | 4 | haystack.d.object_id | 1 | Using where | +----+-------------+-------+-------+----------------+----------------+---------+----------------------+--------+-------------+

+2

mysql

Doug kaye Mar 20 '09 at 10:17

source share

2 answers

Jason's answer is in place. In addition, I will try to use the more advanced ANSI join syntax to unload the WHERE clause in order to eliminate confusion:

 SELECT o.id FROM programs o JOIN titles_programs t ON t.object_id=o.id JOIN descriptions_programs d ON d.object_id=o.id WHERE MATCH (d.text) AGAINST ('+china' IN BOOLEAN MODE) AND d.current=1 OR MATCH (t.text) AGAINST ('+china' IN BOOLEAN MODE) AND t.current=1

This will stop the unintended cross-connection causing a combinatorial explosion; I expect it to work in a reasonable amount of time if the database is not really huge.

If not, can you post the EXPLAIN SELECT results from the above? Presumably, one or both of the full-text indexes are not used. I could, of course, assume that the query optimizer will not be able to use the second full-text index, making something like an attempt to "populate lines that do not match the first full-text query, and not go directly to the index or something like that.

Usually, if you want to use a full-text index on two columns in combination, you create one index on both columns. In any case, it will be much faster. However, this would mean that you should include the names and descriptions in the same table. This may not be so difficult: since fulltext only works with MyISAM tables (and usually you don’t need your canonical data in MyISAM tables), you can save the final copy of your data in normalized InnoDB tables with an additional MyISAM table containing only undressed and buttoned search bait .

If none of this works ... well, I think I will go back to the UNIONing that you talked about in combination with an application-level filter to remove duplicate identifiers.

+2

bobince Mar 21 '09 at 1:32

source share

Jason cohen · Accepted Answer · 2009-03-20T22:38:35+0000

Your problem is that connections in all cases can occur between o and d and t . That is, you need:

 SELECT o.id FROM descriptions_programs d, titles_programs t, programs o WHERE d.object_id=o.id AND t.object_id=o.id AND ( MATCH (d.text) AGAINST ('+china' IN BOOLEAN MODE) AND d.current=1 ) OR ( MATCH (t.text) AGAINST ('+china' IN BOOLEAN MODE) AND t.current=1 )

Why? Because in your first query, you can ignore these brackets - all AND ed together, and the tables join well. In your second query this is not true.

Consider what the database does: it takes "all rows in t" and intersects it with "all rows in d", so t*d rows. Usually you use joins (like me) to limit this to a linear list of valid strings.

But in your OR query, you allow any row to match o instead of o , so for each row in one table that matches you, you also select all rows in another table.

MySQL Syntax and OR Performance

More articles: