I am developing a query for a table containing a bunch of points in a time series. The table can grow quite large, and therefore I want the query to effectively reduce the output by averaging points over fixed time intervals. After writing the query, I am surprised at how SQL Server (2008) decided to execute the query. The execution plan shows an unnecessary sorting operation that becomes expensive as the time series grows. Here is a problem that boils down to a simple example:
CREATE TABLE [dbo].[Example] ( [x] FLOAT NOT NULL, [y] FLOAT NOT NULL, PRIMARY KEY CLUSTERED ( [x] ASC ) ); SELECT FLOOR([x]), AVG([y]) FROM [dbo].[Example] GROUP BY FLOOR([x]);
Here I have (x, y) pairs that are already sorted by x (due to the cluster primary key), and I average y for each integer x (by truncating using the FLOOR function). I would expect the table to be already sorted accordingly for the aggregate, since FLOOR is a monotonous function. Unfortunately, SQL Server decides that this data needs to be re-sorted, and here is the execution plan:

Should SQL Server stream aggregate over data grouped by the monotonous function of columns that are already sorted properly?
Is there a general way to rewrite such queries so that SQL Server sees that the order is saved?
[Update] I found an article on this subject Things SQL needs: the ability of monotone functions , and, as the name implies, it seems to be an optimization that SQL Server has not yet done (in most cases).
Here are even simpler [dbo].[Example] queries that show a point:
SELECT [x], [y] FROM [dbo].[Example] ORDER BY FLOOR([x]) --sort performed in execution plan SELECT [x], [y] FROM [dbo].[Example] ORDER BY 2*[x] --NO sort performed in execution plan SELECT [x], [y] FROM [dbo].[Example] ORDER BY 2*[x]+1 --sort performed in execution plan
In each individual addition or multiplication, the query optimizer understands that the data already has the same order (and this is evident when you also group such expressions). Thus, the concept of monotone functions is understood by the optimizer, as a rule, it is not applied at all.
I am currently testing the computed column / index solution, but it looks like it will significantly increase the size of the data being saved, as I will need several indexes to cover the range of possible intervals.