Faster alternatives for "IN" statements?

I am not very good at MySQL, so I often find that I am preparing suboptimal queries that work, but I know that this is terribly inefficient. I hope you guys could give me some guidance on why the next request does not work, and what methods should I use to execute such requests.

I have the following table structure:

TABLE Files files_id => INT(12), PRIMARY, AUTO INCREMENT, NOT NULL files_name => VARCHAR(255), NOT NULL (some other fields such as file type etc) TABLE File_Permissions perm_id => INT(12), PRIMARY, AUTO INCREMENT, NOT NULL perm_files_id => INT(12), NOT NULL perm_users_id => INT(12), NOT NULL 

I retrieve a list of files that the user can view with the following SQL:

 SELECT files_name FROM Files WHERE files_id IN (SELECT perm_files_id FROM File_Permissions WHERE perm_users_id = 'xxxxxx'); 

As far as I can tell, this will go through each of the thousands of entries in the Files table, and for each of them a subquery is executed, which selects from the File_Permissions table to check for the user ID.

It takes almost 2 seconds for each request. I’m sure that something is fundamentally wrong with this, I just don’t know what it is.

Thank you for help!

+4
source share
7 answers

Most queries that include an IN clause for a subquery can be reorganized to use a join. In your case:

 SELECT files_name FROM Files JOIN File_Permissions ON files_id = perm_files_id WHERE perm_users_id = 'xxxxxx'; 

In the above query, a result set of the join between the two tables will be created, and then a filter by conditions. This requires two passes instead of N + 1.

+1
source

For this kind of query, you can use JOIN, WHERE ... IN or WHERE EXISTS. The approach using IN, as you posted, should be good if you have the appropriate indexes.

Just to compare with something else, here is an example of WHERE EXISTS:

 SELECT files_name FROM Files WHERE EXISTS ( SELECT * FROM File_Permissions WHERE perm_users_id = 'xxxxxx' AND files_id = perm_files_id ) 

But most importantly: add the appropriate indexes ! This can significantly affect performance. If you are not sure if you have the correct indexes, look at the results of the following statements to find out which indexes you have and which indexes use the query:

  • EXPLAIN SELECT ...your query here...
  • SHOW CREATE TABLE Files
  • SHOW CREATE TABLE File_Permissions

If you are still not sure, edit the question to include the output of each of the above instructions, as well as the following:

  • SELECT COUNT(*) FROM Files
  • SELECT COUNT(*) FROM File_Permissions
  • SELECT COUNT(*) FROM (SELECT ...your query here...) T1
+1
source

You can rebuild your query as described above, but you can also try placing the index in perm_users_id first. Most likely, this will speed up the process.

0
source

Your tables need indexes. The above query shows that you need the following:

Table Files needs and indexes on files_id

The File_Permissions table File_Permissions needed and is pointed to perm_users_id

This will make the request much faster.

0
source

I'm not sure why you are not just using a standard connection as follows:

 SELECT <required fields> FROM (Files, File_Permissions) WHERE files_id = perm_files_id AND perm_user_id='xxxxx' 

In addition, you must ensure that the appropriate indexes are set, etc.

Implicit associations are evil - see comments below. :-)

0
source

to try

 SELECT files_name FROM Files LEFT JOIN File_permissions ON files_id = perm_files_id AND perm_users_id = 'xxxxx' 

also indexing joined columns will help improve performance. Thus, the index in perm_files_id will improve performance

0
source

There are two general alternatives:

 SELECT files_name FROM Files f WHERE EXISTS ( SELECT * FROM File_Permissions WHERE f.files_id = perm_files_id AND perm_users_id = 'xxxxxx'); 

and

 SELECT DISTINCT files_name fn FROM Files f JOIN File_Permissions fp ON f.files_id = fp.perm_files_id WHERE perm_users_id = 'xxxxxx'; 
0
source

Source: https://habr.com/ru/post/1342895/


All Articles