Optimize your query or suggest LINQ equivalent

Question

Optimize your query or suggest LINQ equivalent

I have a table containing the columns date_trans, time_trans, price. After selecting a query, I want to add a new column “Count”, which will be calculated as consecutive equal values of the price column, and the previous rows with consecutive equal prices will be removed from the final result. See the expected result:

date_trans time_trans price **Count** 2011-02-22 09:39:59 58.02 1 2011-02-22 09:40:03 58.1 *ROW WILL BE REMOVED 2011-02-22 09:40:07 58.1 *ROW WILL BE REMOVED 2011-02-22 09:40:08 58.1 3 2011-02-22 09:40:10 58.15 1 2011-02-22 09:40:10 58.1 *ROW WILL BE REMOVED 2011-02-22 09:40:14 58.1 2 2011-02-22 09:40:24 58.15 1 2011-02-22 09:40:24 58.18 *ROW WILL BE REMOVED 2011-02-22 09:40:24 58.18 *ROW WILL BE REMOVED 2011-02-22 09:40:24 58.18 3 2011-02-22 09:40:24 58.15 1

Please suggest a SQL query or LINQ expression to select from a table

Currently, I can do this with a select query and iterate over all selected rows, but it takes several hours to select millions of rows.

My current code is:

  string query = @"SELECT date_trans, time_trans, price FROM tbl_data WHERE date_trans BETWEEN '2011-02-22' AND '2011-10-21' AND time_trans BETWEEN '09:30:00' AND '16:00:00'"; DataTable dt = oUtil.GetDataTable(query); DataColumn col = new DataColumn("Count", typeof(int)); dt.Columns.Add(col); int priceCount = 1; for (int count = 0; count < dt.Rows.Count; count++) { double price = Convert.ToDouble(dt.Rows[count]["price"]); double priceNext = (count == dt.Rows.Count - 1) ? 0 : Convert.ToDouble(dt.Rows[count + 1]["price"]); if (price == priceNext) { priceCount++; dt.Rows.RemoveAt(count); count--; } else { dt.Rows[count]["Count"] = priceCount; priceCount = 1; } }

+6

sql mysql linq linq-to-objects linq-to-sql

Mainuddin Jul 15 '12 at 14:11

source share

1 answer

Paweł sokołowski · Accepted Answer · 2012-07-22T20:38:22+0000

This is interesting. I think you will need something like this:

 SELECT MAX(date_trans), MAX(time_trans), MAX(price), COUNT(*) FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY price ORDER BY date_trans, time_trans) - ROW_NUMBER() OVER(ORDER BY date_trans, time_trans) AS grp FROM transactions) grps GROUP BY grp

Found a solution here: http://www.sqlmag.com/article/sql-server/solution-to-the-t-sql-puzzle-grouping-consecutive-rows-with-a-common-element

UPDATE

The price should also be included in the grouping column, otherwise the groups may not be unique. Another thing is that the date and time column must be combined into a datetime column, so the maximum datetime value will be correct in groups that start at the end of one day and end at the beginning of the next. Here's the adjusted query.

 SELECT MAX(CAST(date_trans AS DATETIME) + CAST(time_trans AS DATETIME)) , MAX(price), COUNT(*) FROM (SELECT *, CAST(ROW_NUMBER() OVER(PARTITION BY price ORDER BY date_trans, time_trans) - ROW_NUMBER() OVER(ORDER BY date_trans, time_trans) AS NVARCHAR(255)) + '-' + CAST(price AS NVARCHAR(255)) AS grp FROM transactions ORDER BY date_trans, time_trans) grps GROUP BY grp

A query may be more optimal with a grp column as a byte array or bigint instead of nvarchar. You also mentioned the “volume” column, which you probably want to summarize within the group.

Optimize your query or suggest LINQ equivalent

More articles: