Effective query for only the first N rows for each unique identifier

This is a continuation of this issue.

TL; DR:

Question:

I want to filter the query to save only the first n lines for each unique identifier.

Answer:

query = query.GroupBy(q => q.ID).SelectMany(g => g.Take(n)); 

The problem with this answer is that for 80,000 + lines, query evaluation takes much longer than filtering by iteration ( foreach ) (at least twice as slow). Considering the SQL generated by this answer, CROSS APPLY , most likely for SelectMany() .

This link describes what CROSS APPLY does:

The APPLY statement allows you to join two table expressions; the correct table expression is processed each time for each row from the left table expression.

In short, I'm looking for a filter query that efficiently collects the top N rows for each unique ID .

A Linq solution with explained SQL would be ideal.

+1
source share
2 answers

I found the answer in SQL here (SQL 2000 solution below) and managed to implement the Queryable / Linq version:

 query = tableQueryable.Where(a => tableQueryable.Where(b => b.ID == a.ID) .OrderByDescending(o => o.Timestamp) .Take(N) .Select(s => s.PK) .Contains(a.PK) ).OrderByDescending(d => d.Timestamp); 

Pretty standard "subquery" template. This is much faster on a large table.

+2
source

L2S does not have a line number, so Martin's trick cannot be used. I also ran into this problem, and as I understand it, this is the optimal L2S solution (which does not use native SQL anyway).

You can try to pull all the results into the application and make a line number there. This can damage or benefit performance. Which of them depends on the specific case.

+1
source

Source: https://habr.com/ru/post/953594/


All Articles