Which MySQL query is faster?

Which query will run faster and which is the perfect query?

SELECT COUNT(*) AS count FROM students WHERE status = 1 AND classes_id IN( SELECT id FROM classes WHERE departments_id = 1 ); 

Or

 SELECT COUNT(*) AS count FROM students s LEFT JOIN classes c ON c.id = s.classes_id WHERE status = 1 AND c.departments_id = 1 

I put two queries and it gives the same result. Now I want to know which method will execute faster and which method will be correct?

+1
source share
6 answers

You should always use EXPLAIN to determine how your query will work.

Unfortunately, MySQL will execute your subquery as DEPENDENT QUERY, which means that the subquery will be executed for each row in the outer query. You think MySQL will be smart enough to find that the subquery is not a correlated subquery and will only run it once, alas, it is not yet so smart.

So, MySQL scans all rows in students, runs a subquery for each row, and does not use any indexes in an external query.

Writing a query as a JOIN will allow MySQL to use indexes, and the following query will be the best way to write it:

 SELECT COUNT(*) AS count FROMstudents s JOIN classes c ON c.id = s.classes_id AND c.departments_id = 1 WHERE s.status = 1 

This would use the following indexes:

 students(`status`) classes(`id`, `departements_id`) : multi-column index 
+5
source

In terms of design and clarity, I would avoid internal choices like the first. It is true that it is 100% sure that if and how each query will be optimized and which will work β€œbetter”, you will need to see how the SQL server that you use will interact with it and its plan. In Mysql, use Explain.

However ... Even without seeing this, my money is still in a single version of Join ... The internal version of the selection must perform the entire internal selection in it before determining the values ​​used inside the "IN" sentence - I know it's true when you transfer material to functions, and I'm pretty sure that this is true if you stick to the choice of IN arguments. I also know that this is a good way to completely neutralize any benefits that the indexes on the tables inside the internal selection may have.

I usually have the opinion that choosing Inner is really only needed for very rare situations with requests. Usually, those who use them often think like traditional iterative stream programmers who don’t think in terms of a relational DB result ...

+3
source

EXPLAIN Both queries individually

The difference between both queries is Sub-Queries vs Joins

Joins are likely to be faster than subqueries. Join creates a execution plan and predicts what data will be processed, therefore, it saves time. Subqueries, on the other hand, run all queries until all data has been loaded. Most subprocesses use Sub-requests because they are more readable than JOINS, but where performance matters, JOIN is the best solution.

+2
source

The best way to find out is to measure it:

No index

  • Request 1: 0.9s
  • Request 2: 0.9s

With index

  • Request 1: 0.4s
  • Request 2: 0.2s

Conclusion:

  • If you don't have indexes, then it doesn't matter which query you use.
  • The connection is faster if you have the correct index.
  • The effect of adding the right index is greater than the effect of choosing the right query. If performance matters, make sure you have the right indexes.

Of course, your results may vary depending on the version of MySQL and the distribution of data that you have.

Here's how I tested it:

  • 1,000,000 students (25% with status 1).
  • 50,000 courses.
  • 10 departments.

Here is the SQL I used to create the test data:

 CREATE TABLE students (id INT PRIMARY KEY AUTO_INCREMENT, status int NOT NULL, classes_id int NOT NULL); CREATE TABLE classes (id INT PRIMARY KEY AUTO_INCREMENT, departments_id INT NOT NULL); CREATE TABLE numbers(id INT PRIMARY KEY AUTO_INCREMENT); INSERT INTO numbers VALUES (),(),(),(),(),(),(),(),(),(); INSERT INTO numbers SELECT NULL FROM numbers AS n1 CROSS JOIN numbers AS n2 CROSS JOIN numbers AS n3 CROSS JOIN numbers AS n4 CROSS JOIN numbers AS n5 CROSS JOIN numbers AS n6; INSERT INTO classes (departments_id) SELECT id % 10 FROM numbers WHERE id <= 50000; INSERT INTO students (status, classes_id) SELECT id % 4 = 0, id % 50000 + 1 FROM numbers WHERE id <= 1000000; SELECT COUNT(*) AS count FROM students WHERE status = 1 AND classes_id IN (SELECT id FROM classes WHERE departments_id = 1); SELECT COUNT(*) AS count FROM students s LEFT JOIN classes c ON c.id = s.classes_id WHERE status = 1 AND c.departments_id = 1; CREATE INDEX ix_students ON students(status, classes_id); 
+2
source

Two queries will not produce the same results:

 SELECT COUNT(*) AS count FROM students WHERE status = 1 AND classes_id IN( SELECT id FROM classes WHERE departments_id = 1 ); 

... will return the number of rows in the students table that have the classes_id field, which is also in the class table with section id of 1.

 SELECT COUNT(*) AS count FROM students s LEFT JOIN classes c ON c.id = s.classes_id WHERE status = 1 AND c.departments_id = 1 

... will return the total number of rows in the student table, where the status field is 1 and possibly more than depending on how your data is organized.

If you want the queries to return the same thing, you need to change the LEFT JOIN to INNER JOIN so that it matches only strings that satisfy both conditions.

+1
source

Run EXPLAIN SELECT ... in both queries and check what to do :)

0
source

Source: https://habr.com/ru/post/918532/


All Articles