Here's a neat one for you (MySQL obviously):
# Setting things up
DROP DATABASE IF EXISTS index_test_gutza;
CREATE DATABASE index_test_gutza;
USE index_test_gutza;
CREATE TABLE customer_order (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
invoice MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
INSERT INTO customer_order
(id, invoice)
VALUES
(eleven),
(2, 2),
(3, 3),
(4, 4),
(5, 5);
CREATE TABLE customer_invoice (
id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
invoice_no MEDIUMINT UNSIGNED DEFAULT NULL,
invoice_pdf LONGBLOB,
PRIMARY KEY (id)
);
INSERT INTO customer_invoice
(id, invoice_no)
VALUES
(eleven),
(2, 2),
(3, 3),
(4, 4),
(5, 5);
# Ok, here the beef
EXPLAIN
SELECT co.id
FROM customer_order AS co;
EXPLAIN
SELECT co.id
FROM customer_order AS co
ORDER BY co.id;
EXPLAIN
SELECT co.id, ci.invoice_no
FROM customer_order AS co
LEFT JOIN customer_invoice AS ci ON ci.id = co.invoice;
EXPLAIN
SELECT co.id, ci.invoice_no
FROM customer_order AS co
LEFT JOIN customer_invoice AS ci ON ci.id = co.invoice
ORDER BY co.id;
There are four EXPLAIN statements below. The first two results will lead exactly to what you expect:
+ ---- + ------------- + ------- + ------- + -------------- - + --------- + --------- + ------ + ------ + ------------- +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ------- + ------- + -------------- - + --------- + --------- + ------ + ------ + ------------- +
| 1 | SIMPLE | co | index | NULL | PRIMARY | 3 | NULL | 5 | Using index |
+ ---- + ------------- + ------- + ------- + -------------- - + --------- + --------- + ------ + ------ + ------------- +
The third is already interesting - note that the primary key in customer_order is no longer used:
+ ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ------------- +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ------------- +
| 1 | SIMPLE | co | ALL | NULL | NULL | NULL | NULL | 5 | |
| 1 | SIMPLE | ci | eq_ref | PRIMARY | PRIMARY | 3 | index_test_gutza.co.invoice | 1 | Using index |
+ ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ------------- +
The fourth, however, is zinger - just adding ORDER BY to the primary key leads to filesort on customer_order (which is to be expected, given that it is already confused above):
+ ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ---------------- +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ---------------- +
| 1 | SIMPLE | co | ALL | NULL | NULL | NULL | NULL | 5 | Using filesort |
| 1 | SIMPLE | ci | eq_ref | PRIMARY | PRIMARY | 3 | index_test_gutza.co.invoice | 1 | Using index |
+ ---- + ------------- + ------- + -------- + ------------- - + --------- + --------- + --------------------------- - + ------ + ---------------- +
FileSort! And that, although I never use anything but the primary key in the customer_order table for the order, and the primary key in the customer_invoice table for JOIN. So, in the name of all that is good and right , why does he suddenly switch to filesort ?! And more importantly, how do I avoid this? . For the record, I would happily agree with a documented answer explaining why this cannot be avoided (if so).
As you probably suspect that this is actually happening in production, and although the tables are by no means huge (only hundreds of entries), the filesort in the invoice table (which contains a PDF file) kills the server when I run queries similar to the above (which I need to know which orders were issued by invoices and which are not).
Before asking, I developed a database, and I thought it was safe to store PDF files in this table, because I never need any search queries. I always have my primary key on hand!
Update (comment review)
Here is a brief overview of what was suggested in the comments below, so you don't need to read all of this:
- * You must add the key to customer_order.invoice * - I really tried this during the production process, it does not matter (as it should not)
- You should use
USE INDEX - tried that didn't work. I also tried FORCE INDEX - no result (no change) - You have simplified the precedent, we need the actual production request. Perhaps I lost it too much in the first iteration, so I updated it (I just added
, ci.invoice_no to SELECT for the last couple of queries). For the record, if someone is really interested, here is the production request, just like this (this returns the last page of orders):
SELECT
corder.id,
corder.public_id,
CONCAT (buyer.fname, "", buyer.lname) AS buyer_name,
corder.status,
corder.payment,
corder.reserved AS R,
corder.tracking_id! = "" as A,
corder.payment_received as pay_date,
invoice.invoice_no AS inv,
invoice.receipt_no AS rec,
invoice.public AS pub_inv,
proforma.proforma_no AS prof,
proforma.public AS pub_pf,
corder.rating,
corder.rating_comments! = "" AS got_comment
FROM
corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice as invoice ON invoice.id = corder.invoice
LEFT JOIN invoice as proforma ON proforma.id = corder.proforma
ORDER BY
id DESC
LIMIT 400, 20;
The request above (which, again, is exactly what I'm launching in production) takes about 14 seconds. Here, a simplified query, as shown in the above usage example, is executed during creation:
SELECT
corder.id,
invoice.invoice_no
FROM
corder
LEFT JOIN invoice ON invoice.id = corder.invoice
ORDER BY
corder.id DESC
LIMIT 400, 20;
It takes 13 seconds. Remember that LIMIT does not matter while we talk about the last page of results (which we are). That is, there is absolutely no significant difference between getting the last 12 results or all 412 results when theort file is involved.
Conclusion
The answer to ypercube is not only correct, but, unfortunately, it seems to be the only legal one. I tried to further separate the conditions from the fields because the SELECT * FROM corder may end up involving a lot of data if the corder itself contains LONGBLOB (and duplicating fields from the main query in the subquery is inelegant), but unfortunately t seems to work :
SELECT
corder.id,
corder.public_id,
CONCAT (buyer.fname, "", buyer.lname) AS buyer_name,
corder.status,
corder.payment,
corder.reserved AS R,
corder.tracking_id! = "" AS A,
corder.payment_received AS pay_date,
invoice.invoice_no AS inv,
invoice.receipt_no AS rec,
invoice.public AS pub_inv,
proforma.proforma_no AS prof,
proforma.public AS pub_pf,
corder.rating,
corder.rating_comments! = "" AS got_comment
FROM
corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
WHERE corder.id IN (
SELECT id
FROM corder
ORDER BY id DESC
LIMIT 400.20
)
ORDER BY
corder.id DESC;
This fails with the following error message:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN / ALL / ANY / SOME subquery'
I am using MySQL 5.1.61, which is fairly recent in the 5.1 family (and apparently this is also not supported in 5.5.x).