MySQL - last average columns in another table

I have two tables: "servers" and "statistics"

Servers

have a column called "id" that increments automatically. stats has a column called “server” that corresponds to a row in the server table, a column “time” that represents the time it was added, and a column of “votes” that I would like to get on average.

I would like to get all the servers ( SELECT * FROM servers ) together with the average number of votes from the last 24 lines corresponding to each server. I think this is the largest-to-group issue.

This is what I tried to do, but it gave me 24 lines, not 24 lines for each group:

 SELECT servers.*, IFNULL(AVG(stats.votes), 0) AS avgvotes FROM servers LEFT OUTER JOIN (SELECT server, votes FROM stats GROUP BY server ORDER BY time DESC LIMIT 24) AS stats ON servers.id = stats.server GROUP BY servers.id 

As I said, I would like to get the last 24 rows for each server, and not the last 24 last totals.

+6
source share
3 answers

This is a different approach.

This query will experience the same performance problems as other queries that return the correct results, because the execution plan for this query will require a SORT operation on EVERY row of the statistics table. Since there is no predicate (restriction) in the time column, we will consider the QUALITY line in the statistics table. For a REALLY large stats table, this will remove all available temporary space before it dies from a terrible death. (Additional performance notes below.)

 SELECT r.* , IFNULL(s.avg_votes,0) FROM servers r LEFT JOIN ( SELECT t.server , AVG(t.votes) AS avg_votes FROM ( SELECT CASE WHEN u.server = @last_server THEN @i := @i + 1 ELSE @i := 1 END AS i , @last_server := u.server AS `server` , u.votes AS votes FROM (SELECT @i := 0, @last_server := NULL) i JOIN ( SELECT v.server, v.votes FROM stats v ORDER BY v.server DESC, v.time DESC ) u ) t WHERE ti <= 24 GROUP BY t.server ) s ON s.server = r.id 

This query sorts the statistics table, by server and descending in the time column. (Inline view aliased as u .)

With a sorted result set, we assign line numbers 1,2,3, etc. each row for each server. (Inline view aliased as t .)

With this set of results, we filter out any rows with rownumber> 24 and calculate the average value of the votes column for the "last" 24 rows for each server. (Inline view aliased as s .)

As a final step, we attach this to the server table to return the requested result set.


Note:

The execution plan for this query will be COSTLY for a large number of rows in the stats table.

To increase productivity, we can take several approaches.

The simplest way is to include a significant number of rows from the stats table in the EXCLUDES predicate query (for example, rows with time values ​​older than 2 days or older than 2 weeks). This would significantly reduce the number of lines that need to be sorted to determine the "last" 24 lines.

In addition, with an index on stats(server,time) , it is also possible that MySQL could do a relatively efficient “reverse scan” of the index, avoiding the sort operation.

We could also consider using the index in the statistics table on (server,"reverse_time") . Since MySQL does not yet support descending indexes, the implementation will be a really regular (incremental) index for the derived rtime value (the expression "reverse time" that increments for descending time values ​​(for example, -1*UNIX_TIMESTAMP(my_timestamp) or -1*TIMESTAMPDIFF('1970-01-01',my_datetime) .

Another approach to improving performance is to keep a shadow table containing the last 24 rows for each server. This would be easier to implement if we can guarantee that the "last lines" will not be removed from the stats table. We could maintain this table with a trigger. Basically, whenever a row is inserted into the stats table, we check to see if time in new rows is later than the earliest time stored for the server in the shadow table, if so, we replace the earliest row in the shadow table with new line, do not forget to save no more than 24 lines in the shadow table for each server.

And another approach is to write a procedure or function that gets the result. The approach here is to loop through each server and launch a separate query on the statistics table to get the average votes for the last 24 rows and put all these results together. (This approach can indeed be a rather workaround to avoid sorting on a huge temporary set, just to return the returned result set, without necessarily making the return of results very fast.)

The bottom line for performing this type of query in the LARGE table limits the number of rows considered in the query and excludes the sort operation on a large set. This is how we get such a request.


ADDITION

To get the "reverse index" operation (to get the rows from stats ordered using the index WITHOUT the filesort operation), I had to specify DESCENDING for both expressions in the ORDER BY clause. Previously, the query had ORDER BY server ASC, time DESC , and MySQL always wanted to make a file array, even specifying the FORCE INDEX FOR ORDER BY (stats_ix1) .

If the requirement is to return the “average voice” only for the server only , if there are at least 24 related rows in the statistics table, then we can make a more efficient query, even if it is a little more dirty. (Most of the clutter in IF () nested functions is to deal with NULL values ​​that are not included in the average. This can be much less messy if we have a guarantee that votes not NULL, or if we exclude any lines where votes are NULL.)

 SELECT r.* , IFNULL(s.avg_votes,0) FROM servers r LEFT JOIN ( SELECT t.server , t.tot/NULLIF(t.cnt,0) AS avg_votes FROM ( SELECT IF(v.server = @last_server, @num := @num + 1, @num := 1) AS num , @cnt := IF(v.server = @last_server,IF(@num <= 24, @cnt := @cnt + IF(v.votes IS NULL,0,1),@cnt := 0),@cnt := IF(v.votes IS NULL,0,1)) AS cnt , @tot := IF(v.server = @last_server,IF(@num <= 24, @tot := @tot + IFNULL(v.votes,0) ,@tot := 0),@tot := IFNULL(v.votes,0) ) AS tot , @last_server := v.server AS SERVER -- , v.time -- , v.votes -- , @tot/NULLIF(@cnt,0) AS avg_sofar FROM (SELECT @last_server := NULL, @num:= 0, @cnt := 0, @tot := 0) u JOIN stats v FORCE INDEX FOR ORDER BY (stats_ix1) ORDER BY v.server DESC, v.time DESC ) t WHERE t.num = 24 ) s ON s.server = r.id 

With a coverage index on stats(server,time,votes) , EXPLAIN showed that MySQL avoids the fileort operation, so it had to use a "reverse index scan" to get the rows back in order. There is no coverage index and index on '(server, time) , MySQL used the index if I included an index hint, with the FORCE INDEX FOR ORDER BY (stats_ix1) `hint, MySQL also avoided the file array. (But since my table had less than 100 rows, I don’t think MySQL pays much attention to avoiding the fileort operation.)

Expressions of time, voices and avg_sofar are expressed (in the embedded representation with the alias t ); they are not needed, but they are intended for debugging.

The way this request costs, for each server, at least 24 lines of statistics are required to return the average value. (This may be acceptable.) But I thought that in general we can return the total, total (tot) and operation counter (cnt).

(If we replace WHERE t.num = 24 with WHERE t.num <= 24 , we will see the current average in action.)

To return the average value when there are at least 24 lines in the statistics, it is really a question of identifying a line (for each server) with a maximum value of num, which is <= 24.

+1
source

Thanks for this great post .

 alter table add index(server, time) set @num:=0, @server:=''; select servers.*, IFNULL(AVG(stats.votes), 0) AS avgvotes from servers left outer join ( select server, time,votes, @num := if(@server = server, @num + 1, 1) as row_number, @server:= server as dummy from stats force index(server) group by server, time having row_number < 25) as stats on servers.id = stats.server group by servers.id 

change 1

I just noticed that the above query gives the oldest 24 entries for each group.

  set @num:=0, @server:=''; select servers.*, IFNULL(AVG(stats.votes), 0) AS avgvotes from servers left outer join ( select server, time,votes, @num := if(@server = server, @num + 1, 1) as row_number, @server:= server as dummy from (select * from stats order by server, time desc) as t group by server, time having row_number < 25) as stats on servers.id = stats.server group by servers.id 

which will give an average of 24 new objects for each group

Edit2

@DrAgonmoray you can first try the inside of the query and see if it will return the last 24 entries for each group. In my mysql 5.5 it works correctly.

 select server, time,votes, @num := if(@server = server, @num + 1, 1) as row_number, @server:= server as dummy from (select * from stats order by server, time desc) as t group by server, time having row_number < 25 
+2
source

Try this solution using the top-n-group method to debug the INNER JOIN credited to Bill Karwin and his message about it.

 SELECT a.*, AVG(b.votes) AS avgvotes FROM servers a INNER JOIN ( SELECT aa.server, aa.votes FROM stats aa LEFT JOIN stats bb ON aa.server = bb.server AND aa.time < bb.time GROUP BY aa.time HAVING COUNT(*) < 24 ) b ON a.id = b.server GROUP BY a.id 
0
source

Source: https://habr.com/ru/post/918507/


All Articles