How to improve performance order with joins in mysql

I am working on a social network tracking application. Even joins work fine with proper indexing. But when I add the order by clause, the general query takes 100 times as long to complete. The following query I used to get twitter_users without an order by clause.

SELECT DISTINCT `tracked_twitter`.id FROM tracked_twitter INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id` INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id` AND `tracker_twitter_content`.`tracker_id` = '88' LIMIT 20 

Display lines 0 - 19 (total 20 queries, queries 0.0714 sec)

But when I add order by (by indexed column)

 SELECT DISTINCT `tracked_twitter`.id FROM tracked_twitter INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id` INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id` AND `tracker_twitter_content`.`tracker_id` = '88' ORDER BY tracked_twitter.followers_count DESC LIMIT 20 

Display lines 0 - 19 (total 20 requests, 13.4636 seconds)

EXPLAIN enter image description here

When I implement the order by clause in my table, it does not take much time

 SELECT * FROM `tracked_twitter` WHERE 1 order by `followers_count` desc limit 20 

Showing lines 0 - 19 (total 20 requests, received 0.0711 seconds) [followers_count: 68236387 - 10525612]

The request to create a table as follows

 CREATE TABLE IF NOT EXISTS `tracked_twitter` ( `id` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `handle` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `name` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `location` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `description` text COLLATE utf8_unicode_ci, `profile_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `followers_count` int(11) NOT NULL, `is_influencer` tinyint(1) NOT NULL DEFAULT '0', `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `gender` enum('Male','Female','Other') COLLATE utf8_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`), KEY `followers_count` (`followers_count`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; 

This way, the join did not slow down the query and order, working well when I execute it on my table. So how can I improve performance?

UPDATE 1

The @GordonLinoff method decides if I only need a result set from the parent table. What I want to know is the number of tweets per person (the number of twitter_content that corresponds to the tracked_twitter table). How can I change it? And if I want to have math functions in tweet content, how to do it?

 SELECT `tracked_twitter` . * , COUNT( * ) AS twitterContentCount, retweet_count + favourite_count + reply_count AS engagement FROM `tracked_twitter` INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id` INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id` WHERE `is_influencer` != '1' AND `tracker_twitter_content`.`tracker_id` = '88' AND `tracked_twitter_id` != '0' GROUP BY `tracked_twitter`.`id` ORDER BY twitterContentCount DESC LIMIT 20 OFFSET 0 
+5
source share
3 answers

Try to get rid of distinct . This is a performance killer. I'm not sure why your first request is fast; MySQL is probably smart enough to optimize it.

I would try:

 SELECT tt.id FROM tracked_twitter tt WHERE EXISTS (SELECT 1 FROM twitter_content tc INNER JOIN tracker_twitter_content ttc ON tc.id = ttc.twitter_content_id WHERE ttc.tracker_id = 88 AND tt.id = tc.tracked_twitter_id ) ORDER BY tt.followers_count DESC ; 

For this version you need indexes: tracked_twitter(followers_count, id) , twitter_content(tracked_twitter_id, id) and tracker_twitter_content(twitter_content_id, tracker_id) .

+3
source

The parent table is held in parenthesis with a limit

 SELECT DISTINCT `tracked_twitter`.id FROM (SELECT id,followers_count FROM tracked_twitter ORDER BY followers_count DESC LIMIT 20) AS tracked_twitter INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id` INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id` AND `tracker_twitter_content`.`tracker_id` = '88' ORDER BY tracked_twitter.followers_count DESC 
+1
source

The main problem is that even if you have relatively few rows, you use varchar(255) COLLATE utf8_unicode_ci as the primary key (instead of integers) and therefore as a foreign key in other tables. I suspect the same problem is with twitter_content.id . This results in a large number of long string comparisons and a reservation of a large amount of additional memory for temporary tables.

As for the query itself, yes, it should be a query that goes by the followers_count index and checks the condition for related tables. This can be done, as suggested by Gordon Linoff, or with the help of indicative tips.

+1
source

Source: https://habr.com/ru/post/1271527/


All Articles