Slow full-text MySQL search

I use this query for a full-text search in a MySQL database:

SELECT DISTINCT 
questions.id, 
questions.uniquecode, 
questions.spam,
questions.questiondate,
questions.userid,
questions.description,
users.login AS username,
questions.questiontext,
questions.totalvotes,
MATCH(questions.questiontext, questions.uniquecode) 
AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) AS relevance 

FROM questions 

LEFT JOIN users ON questions.userid = users.id 
LEFT JOIN answer_mapping ON questions.id = answer_mapping.questionid 
LEFT JOIN answers ON answer_mapping.answerid = answers.id
LEFT JOIN tagmapping ON questions.id = tagmapping.questionid
LEFT JOIN tags ON tagmapping.tagid = tags.id 

WHERE questions.spam < 10 

AND 

(
  MATCH(questions.questiontext, questions.uniquecode) 
  AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) 

OR MATCH(answers.answertext) AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) 

OR MATCH (tags.tag) AGAINST ('rock guitarist chick*' IN BOOLEAN MODE)

) GROUP BY questions.id ORDER BY relevance DESC

The results are very relevant, but the search is very slow and slower and slower as the tables grow.

Table statistics:

questions - 400 entries

indices

  • PRIMARY BTREE - id
  • BTREE - uniquecode
  • BTREE - questiondate
  • BTREE - userid
  • FULLTEXT - questiontext
  • FULLTEXT - uniquecode

answers - 3,635 records

indices

  • PRIMARY - BTREE - id
  • BTREE - answerdate
  • BTREE - questionid
  • FULLTEXT - answertext

answer_mapping - 4,228 records

indices

  • PRIMARY - BTREE - id
  • BTREE - answerid
  • BTREE - questionid
  • BTREE - userid

tags - 1,847 records

indices

  • PRIMARY - BTREE - id
  • BTREE - tag
  • FULLTEXT - tag

tagmapping - 3,389 entries

indices

  • PRIMARY - BTREE - id
  • BTREE - tagid
  • BTREE - questionid

- , tagmapping , .

- , ?

!

+3
3

, - . , . .., . , ... 400 ? ... ? . / ?

, , . mysql . myisam.

lucene/solr smax. 50 -100 . - , . . , , , , , ..

+1

, . , EXPLAIN FORMAT=JSON SELECT ... .

, , . ( , .)

-, . 3 FT , (UNION) question_ids .

    ( SELECT question_id,
         MATCH (... ) as relevance
         FROM questions
         WHERE MATCH (questiontext, ...) AGAINST ... )
    UNION ALL
    ( SELECT am.question_id,
         MATCH (... ) as relevance
         FROM answers AS a
         JOIN answer_mapping AS am ON am.answerid = a.id
         WHERE MATCH (a.answertext) AGAINST ... )
    UNION ALL
    ( SELECT tm.question_id,
         MATCH (... ) as relevance
         FROM tags AS t
         JOIN tagsmapping tm ON ...
         WHERE MATCH (t.tag) AGAINST ... )

, FT question_id.

:

SELECT question_id,
         MAX(relevance)  -- (this fixes the unseen bug)
    FROM ( that query ) AS q1
    GROUP BY question_id
    ORDER BY relevance DESC  -- optional; needed for `LIMIT`
    LIMIT 20          -- to limit the rows, do it at this stage

"" question_ids, ...

, :

SELECT .... -- the `questions` fields, using `q....`,
       ( SELECT login FROM users WHERE q.userid = id ) AS username
    FROM ( the intermediate query ) AS q2
    JOIN questions AS q
    questions q.spam < 10 
    ORDER BY q2.relevance

Yes, it is JOINingback to questions, but it turns out to be faster.

Please note that GROUP BYis not listed here. And if there was an internal request LIMIT, it will not be needed here.

I apologize if I did not quite understand everything correctly; there were more conversions than I expected.

0
source

Source: https://habr.com/ru/post/1767701/


All Articles