Why doesn't MySQL use the primary key in JOIN plus ORDER?

Here's a neat one for you (MySQL obviously):

  # Setting things up
 DROP DATABASE IF EXISTS index_test_gutza;
 CREATE DATABASE index_test_gutza;
 USE index_test_gutza;

 CREATE TABLE customer_order (
     id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
     invoice MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
     PRIMARY KEY (id)
 );
 INSERT INTO customer_order
     (id, invoice)
     VALUES
     (eleven),
     (2, 2),
     (3, 3),
     (4, 4),
     (5, 5);

 CREATE TABLE customer_invoice (
     id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
     invoice_no MEDIUMINT UNSIGNED DEFAULT NULL,
     invoice_pdf LONGBLOB,
     PRIMARY KEY (id)
 );
 INSERT INTO customer_invoice
     (id, invoice_no)
     VALUES
     (eleven),
     (2, 2),
     (3, 3),
     (4, 4),
     (5, 5);

 # Ok, here the beef
 EXPLAIN
     SELECT co.id
     FROM customer_order AS co;

 EXPLAIN
     SELECT co.id
     FROM customer_order AS co
     ORDER BY co.id;

 EXPLAIN
     SELECT co.id, ci.invoice_no
     FROM customer_order AS co
     LEFT JOIN customer_invoice AS ci ON ci.id = co.invoice;

 EXPLAIN
     SELECT co.id, ci.invoice_no
     FROM customer_order AS co
     LEFT JOIN customer_invoice AS ci ON ci.id = co.invoice
     ORDER BY co.id;

There are four EXPLAIN statements below. The first two results will lead exactly to what you expect:

  + ---- + ------------- + ------- + ------- + -------------- - + --------- + --------- + ------ + ------ + ------------- +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + ------- + ------- + -------------- - + --------- + --------- + ------ + ------ + ------------- +
 |  1 |  SIMPLE |  co |  index |  NULL |  PRIMARY |  3 |  NULL |  5 |  Using index |
 + ---- + ------------- + ------- + ------- + -------------- - + --------- + --------- + ------ + ------ + ------------- +

The third is already interesting - note that the primary key in customer_order is no longer used:

  + ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ------------- +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ------------- +
 |  1 |  SIMPLE |  co |  ALL |  NULL |  NULL |  NULL |  NULL |  5 |  |
 |  1 |  SIMPLE |  ci |  eq_ref |  PRIMARY |  PRIMARY |  3 |  index_test_gutza.co.invoice |  1 |  Using index |
 + ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ------------- +

The fourth, however, is zinger - just adding ORDER BY to the primary key leads to filesort on customer_order (which is to be expected, given that it is already confused above):

  + ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ---------------- +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ---------------- +
 |  1 |  SIMPLE |  co |  ALL |  NULL |  NULL |  NULL |  NULL |  5 |  Using filesort |
 |  1 |  SIMPLE |  ci |  eq_ref |  PRIMARY |  PRIMARY |  3 |  index_test_gutza.co.invoice |  1 |  Using index |
 + ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ---------------- +

FileSort! And that, although I never use anything but the primary key in the customer_order table for the order, and the primary key in the customer_invoice table for JOIN. So, in the name of all that is good and right , why does he suddenly switch to filesort ?! And more importantly, how do I avoid this? . For the record, I would happily agree with a documented answer explaining why this cannot be avoided (if so).

As you probably suspect that this is actually happening in production, and although the tables are by no means huge (only hundreds of entries), the filesort in the invoice table (which contains a PDF file) kills the server when I run queries similar to the above (which I need to know which orders were issued by invoices and which are not).

Before asking, I developed a database, and I thought it was safe to store PDF files in this table, because I never need any search queries. I always have my primary key on hand!

Update (comment review)

Here is a brief overview of what was suggested in the comments below, so you don't need to read all of this:

  • * You must add the key to customer_order.invoice * - I really tried this during the production process, it does not matter (as it should not)
  • You should use USE INDEX - tried that didn't work. I also tried FORCE INDEX - no result (no change)
  • You have simplified the precedent, we need the actual production request. Perhaps I lost it too much in the first iteration, so I updated it (I just added , ci.invoice_no to SELECT for the last couple of queries). For the record, if someone is really interested, here is the production request, just like this (this returns the last page of orders):
  SELECT
     corder.id,
     corder.public_id,
     CONCAT (buyer.fname, "", buyer.lname) AS buyer_name,
     corder.status,
     corder.payment,
     corder.reserved AS R,
     corder.tracking_id! = "" as A,
     corder.payment_received as pay_date,
     invoice.invoice_no AS inv,
     invoice.receipt_no AS rec,
     invoice.public AS pub_inv,
     proforma.proforma_no AS prof,
     proforma.public AS pub_pf,
     corder.rating,
     corder.rating_comments! = "" AS got_comment
 FROM
     corder
 LEFT JOIN user as buyer ON buyer.id = corder.buyer
 LEFT JOIN invoice as invoice ON invoice.id = corder.invoice
 LEFT JOIN invoice as proforma ON proforma.id = corder.proforma
 ORDER BY
     id DESC 
 LIMIT 400, 20;

The request above (which, again, is exactly what I'm launching in production) takes about 14 seconds. Here, a simplified query, as shown in the above usage example, is executed during creation:

  SELECT
     corder.id,
     invoice.invoice_no
 FROM
     corder
 LEFT JOIN invoice ON invoice.id = corder.invoice
 ORDER BY
     corder.id DESC 
 LIMIT 400, 20;

It takes 13 seconds. Remember that LIMIT does not matter while we talk about the last page of results (which we are). That is, there is absolutely no significant difference between getting the last 12 results or all 412 results when theort file is involved.

Conclusion

The answer to ypercube is not only correct, but, unfortunately, it seems to be the only legal one. I tried to further separate the conditions from the fields because the SELECT * FROM corder may end up involving a lot of data if the corder itself contains LONGBLOB (and duplicating fields from the main query in the subquery is inelegant), but unfortunately t seems to work :

  SELECT
     corder.id,
     corder.public_id,
     CONCAT (buyer.fname, "", buyer.lname) AS buyer_name,
     corder.status,
     corder.payment,
     corder.reserved AS R,
     corder.tracking_id! = "" AS A,
     corder.payment_received AS pay_date,
     invoice.invoice_no AS inv,
     invoice.receipt_no AS rec,
     invoice.public AS pub_inv,
     proforma.proforma_no AS prof,
     proforma.public AS pub_pf,
     corder.rating,
     corder.rating_comments! = "" AS got_comment
 FROM
     corder
 LEFT JOIN user as buyer ON buyer.id = corder.buyer
 LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
 LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
 WHERE corder.id IN (
     SELECT id
     FROM corder
     ORDER BY id DESC
     LIMIT 400.20
 )
 ORDER BY
     corder.id DESC;

This fails with the following error message:

  ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN / ALL / ANY / SOME subquery'

I am using MySQL 5.1.61, which is fairly recent in the 5.1 family (and apparently this is also not supported in 5.5.x).

+4
source share
1 answer

Can you try this version (it basically gets the first 420 rows of the corder table, saves 20 of them, and then performs 3 external joins):

 SELECT corder.id, corder.public_id, CONCAT(buyer.fname," ",buyer.lname) AS buyer_name, corder.status, corder.payment, corder.reserved AS R, corder.tracking_id != "" AS A, corder.payment_received AS pay_date, invoice.invoice_no AS inv, invoice.receipt_no AS rec, invoice.public AS pub_inv, proforma.proforma_no AS prof, proforma.public AS pub_pf, corder.rating, corder.rating_comments!="" AS got_comment FROM ( SELECT * FROM corder ORDER BY id DESC LIMIT 400, 20 ) AS corder LEFT JOIN user as buyer ON buyer.id = corder.buyer LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma ORDER BY corder.id DESC ; 
+4
source

Source: https://habr.com/ru/post/1443459/


All Articles