Problem using ROW_NUMBER () OVER (PARTITION BY ...)

Question

Problem using ROW_NUMBER () OVER (PARTITION BY ...)

I am using SQL Server 2008 R2. I have a table called EmployeeHistory with the following structure and sample data:

EmployeeID Date DepartmentID SupervisorID 10001 20130101 001 10009 10001 20130909 001 10019 10001 20131201 002 10018 10001 20140501 002 10017 10001 20141001 001 10015 10001 20141201 001 10014

Please note that Employee 10001 changes over 2 departments and several supervisors over time. What I'm trying to do is list the start and end dates of this employee in each department, sorted by date. Thus, the result will look like this:

 EmployeeID DateStart DateEnd DepartmentID 10001 20130101 20131201 001 10001 20131201 20141001 002 10001 20141001 NULL 001

I planned to use data sharing with the following query, but this failed. The department changes from 001 to 002, and then back to 001. Obviously, I cannot divide by DepartmentID ... I am sure that I am ignoring the obvious. Any help? Thanks in advance.

 SELECT * ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY [Date]) RN FROM EmployeeHistory

+8

sql-server-2008 row-number

Thracian Nov 12 '13 at 5:10

source share

2 answers

I would do something like this:

 ;WITH x AS (SELECT *, Row_number() OVER( partition BY employeeid ORDER BY datestart) rn FROM employeehistory) SELECT * FROM x x1 LEFT OUTER JOIN x x2 ON x1.rn = x2.rn + 1

Or maybe it will be x2.rn - 1. You will need to see. In any case, you get the idea. When you have a table joined together by itself, you can filter, group, sort, etc., to get what you need.

+9

Trevor Nov 12 '13 at 5:19

source share

Dominic P · Accepted Answer · 2013-11-12T05:53:13+0000

A little involved It would be easiest to turn to this SQL script, which I created for you, which gives an accurate result. There are ways you can improve it in terms of performance or other considerations, but hopefully this should be at least clearer than some alternatives.

The bottom line is that you first get the canonical ranking of your data, then use it to segment the data into groups, then find the end date for each group, and then delete all the intermediate lines. ROW_NUMBER () and CROSS APPLY help a lot with this.

EDIT 2019:

For some reason, the SQL Fiddle does seem broken, but on the SQL Fiddle site, it seems to be a problem. Here is the full version just tested on SQL Server 2016:

 CREATE TABLE Source ( EmployeeID int, DateStarted date, DepartmentID int ) INSERT INTO Source VALUES (10001,'2013-01-01',001), (10001,'2013-09-09',001), (10001,'2013-12-01',002), (10001,'2014-05-01',002), (10001,'2014-10-01',001), (10001,'2014-12-01',001) SELECT *, ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank, newid() as GroupKey, CAST(NULL AS date) AS EndDate INTO #RankedData FROM Source ; UPDATE #RankedData SET GroupKey = beginDate.GroupKey FROM #RankedData sup CROSS APPLY ( SELECT TOP 1 GroupKey FROM #RankedData sub WHERE sub.EmployeeID = sup.EmployeeID AND sub.DepartmentID = sup.DepartmentID AND NOT EXISTS ( SELECT * FROM #RankedData bot WHERE bot.EmployeeID = sup.EmployeeID AND bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND bot.DepartmentID <> sup.DepartmentID ) ORDER BY DateStarted ASC ) beginDate (GroupKey); UPDATE #RankedData SET EndDate = nextGroup.DateStarted FROM #RankedData sup CROSS APPLY ( SELECT TOP 1 DateStarted FROM #RankedData sub WHERE sub.EmployeeID = sup.EmployeeID AND sub.DepartmentID <> sup.DepartmentID AND sub.EntryRank > sup.EntryRank ORDER BY EntryRank ASC ) nextGroup (DateStarted); SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData ) FinalRanking WHERE GroupRank = 1 ORDER BY EntryRank; DROP TABLE #RankedData DROP TABLE Source

Problem using ROW_NUMBER () OVER (PARTITION BY ...)

More articles: