MySQL is very slow query

My table has the following columns:

gamelogs_id (auto_increment primary key) player_id (int) player_name (varchar) game_id (int) season_id (int) points (int) 

The table shows the following indices

 +-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | player_gamelogs | 0 | PRIMARY | 1 | player_gamelogs_id | A | 371330 | NULL | NULL | | BTREE | | | | player_gamelogs | 1 | player_name | 1 | player_name | A | 3375 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | points | 1 | points | A | 506 | NULL | NULL | YES | BTREE | ## Heading ##| | | player_gamelogs | 1 | game_id | 1 | game_id | A | 37133 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | season | 1 | season | A | 30 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | team_abbreviation | 1 | team_abbreviation | A | 70 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | player_id | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | player_id | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | player_id | 3 | dk_points | A | 371330 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | game_player_season | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | game_player_season | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | | | player_gamelogs | 1 | game_player_season | 3 | season_id | A | 371330 | NULL | NULL | | BTREE | | | +-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 

I am trying to calculate the average score for a season and a player before the game. Thus, for the third game of the season, avg_points will be the average number of games 1 and 2. The game numbers are in the following order, so the earlier game is smaller than the later. I also have the option of using a date field, but I decided that a numerical comparison would be faster?

My request is as follows:

 SELECT game_id, player_id, player_name, (SELECT avg(points) FROM player_gamelogs t2 WHERE t2.game_id < t1.game_id AND t1.player_id = t2.player_id AND t1.season_id = t2.season_id) AS avg_points FROM player_gamelogs t1 ORDER BY player_name, game_id; 

EXPLAIN produces the following output:

 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 371330 | Using filesort | | 2 | DEPENDENT SUBQUERY | t2 | ALL | game_id,player_id,game_player_season | NULL | NULL | NULL | 371330 | Range checked for each record (index map: 0xC8) | 

I am not sure if this is due to the nature of the task or due to the inefficiency of my request. Thanks for any suggestions!

+5
source share
3 answers

Please consider this request:

 SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points FROM player_gamelogs t1 LEFT JOIN player_gamelogs t2 ON t1.game_id > t2.game_id AND t1.player_id = t2.player_id AND t1.season_id = t2.season_id GROUP BY t1.season_id, t1.game_id, t1.player_id, t1.player_name ORDER BY t1.player_name, t1.game_id; 

Notes:

  • For optimal performance, you will need an additional index (seasonal_games, game_id, player_id, player_name)
  • Even better, it would be to have a table of players where you can get the name from the identifier. It seems to me redundant that we should grab the player’s name from the log table, moreover, if necessary in the index.
  • Group by already sorted by grouped columns. If possible, do not order afterwards, as it creates useless overheads. As indicated in the comments, this is not an official behavior, and the result of accepting its sequence over time should be considered and the risk of a sudden loss of sorting.
+7
source

Your request is in order, as written:

 SELECT game_id, player_id, player_name, (SELECT avg(t2.points) FROM player_gamelogs t2 WHERE t2.game_id < t1.game_id AND t1.player_id = t2.player_id AND t1.season_id = t2.season_id ) AS avg_points FROM player_gamelogs t1 ORDER BY player_name, game_id; 

But for optimal performance, you need two composite indexes: (player_id, season_id, game_id, points) and (player_name, game_id, season_id) .

The first index should speed up the subquery. The second is for external order by .

+2
source

Since you have your request now, you use EACH for the game and all the games under it for each player ... So, for example, if you had 10 games per person, you get the following results for the season / person

 Game 10, Game 10 points, avg of games 1-9 Game 9, Game 9 points, avg of games 1-8... ... ... Game 2, Game 2 points, avg of thus final game 1 only. 

You stated that you want the latest game with an average of everything below it. However, I assume that you DO NOT care about each of the lower levels of the game per person.

You also complete a query covering ALL seasons. If the season is over, do you like the old seasons? or only in the current season. Otherwise, you will go through all seasons, all players ...

All that said, I propose the following. First, restrict the query to any last season using the WHERE clause, but I STRONGLY leave the season in the query / group in case you want other seasons. Then I get the MAXIMUM game for a given person / season as the baseline for the final 1 row (for each season), and then gets the average of all this. Thus, in the example scenario from 10 games to 2, I will not capture the base lines 9-2, just returning the game number 10 in my scenario.

 select pgMax.Player_ID, pgMax.Season_ID, pgMax.mostRecentGameID, pgl3.points as mostRecentGamePoints, pgl3.player_name, coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame from ( select pgl1.player_id, pgl1.season_id, max( pgl1.game_id ) as mostRecentGameID from player_gameLogs pgl1 where pgl1.season_id = JustOneSeason group by pgl1.player_id, pgl1.season_id ) pgMax JOIN player_gamelogs pgl pgl2 on pgMax.player_id = pgl2.player_id AND pgMax.season_id = pgl2.season_id AND pgMax.mostRecentGameID > pgl2.game_id JOIN player_gamelogs pgl pgl3 on pgMax.player_id = pgl3.player_id AND pgMax.season_id = pgl3.season_id AND pgMax.mostRecentGameID = pgl3.game_id group by pgMax.Player_ID, pgMax.Season_ID order by pgMax.Player_ID 

Now, to optimize a query, a composite index will be best used (player_id, season_id, game_id, points). HOWEVER, if you are looking only for what was in the current season, indicate your index on (season_id, player_id, game_id, points) by inserting the SEASON ID in the first position to pre-qualify the WHERE clause.

+1
source

Source: https://habr.com/ru/post/1239535/


All Articles