MySQL FULLTEXT Search through> 1 table

As a more general case this question is , because I think it might be of interest to more people ... What is the best way to perform full-text search on two tables? Suppose there are three tables, one for programs (with submitter_id) and one for tags and descriptions with object_id: foreign keys related to entries in programs. We want to submitter_id programs with specific text in their tags or descriptions. We should use MATCH AGAINST for reasons that I will not be doing here. Do not dwell on this aspect.

programs id submitter_id tags_programs object_id text descriptions_programs object_id text 

The following works and runs for 20 ms or so:

 SELECT p.submitter_id FROM programs p WHERE p.id IN (SELECT t.object_id FROM titles_programs t WHERE MATCH (t.text) AGAINST ('china') UNION ALL SELECT d.object_id FROM descriptions_programs d WHERE MATCH (d.text) AGAINST ('china')) 

but I tried to rewrite this as a JOIN as follows, and it works for a very long time. I have to kill him in 60 seconds.

 SELECT p.id FROM descriptions_programs d, tags_programs t, programs p WHERE (d.object_id=p.id AND MATCH (d.text) AGAINST ('china')) OR (t.object_id=p.id AND MATCH (t.text) AGAINST ('china')) 

Just out of curiosity, I replaced OR with the AND character. It also works in a few milliseconds, but that is not what I need. What happened to the above second request? I can live with UNION and subselects, but I would like to understand.

+4
source share
4 answers

Join filters (for example, join results), do not try to join, and then filter.

The reason is that you are losing the use of your full-text index.

Clarification in response to comment: I use the word join here, not as a JOIN , but as a synonym for a union or a union.

Essentially, I am saying that you should use the first (faster) query or something like that. The reason for this is that each of the subqueries is clean enough that db can use this full text table index to make the selection very quickly. Joining two (supposedly much smaller) result sets (with UNION ) is also fast. This means that everything is fast.

The slow version ends by going through a lot of data checking it to see if it wants you, rather than quickly looking at the data and looking only for the lines that you most likely want.

+5
source

Just in case, you do not know: MySQL has a built-in EXPLAIN statement, which can be used to view what is happening under the surface. There are many articles about this, so I won’t go into details, but for each table it contains an estimate of the number of rows that it will need to process. If you look at the β€œrows” column as a result of EXPLAIN for the second query, you will probably see that the number of rows is quite large and, of course, much larger than from the first.

The network is full of warnings about the use of subqueries in MySQL, but it turns out that the developer is many times smarter than the MySQL optimizer. Filtering the results in any way before joining can lead to significant performance improvements in many cases.

+1
source

If you join both tables, you will get many records to check. As an example, if both tables have 100,000 records, their full join gives you 10,000,000,000 records (10 billion!).

If you change OR to AND, you allow the mechanism to filter out all entries from program description tables that do not match china, and only then join titles_programs.

In any case, this is not what you need, so I recommend sticking with the UNION path.

0
source

Union is the right way. Joining will be immediately delayed with both full text indexes and several quantities that were actually executed.

0
source

Source: https://habr.com/ru/post/896902/


All Articles