Creating groups with a uniform distribution in the aggregate

Question

Creating groups with a uniform distribution in the aggregate

This is probably a newbie question, but I want to split our server inventory into several groups with uniform size, based on the total size of the database, and I thought about how to group them. I think NTILE will work, maybe, but I just can't wrap my head around dividing the groups equally. My example below is just to arrange servers randomly. I would like the results to be 3 groups of fairly even size (obviously not accurate).

Using SQL Server 2012. Any help is appreciated. Thanks.

declare @Servers table (ServerName sysname, TotalSizeGB decimal (12,2)) insert into @Servers values ('Server1',123.45), ('Server2',234.56), ('Server3',345.67), ('Server4',456.78), ('Server5',567.89), ('Server6',678.90), ('Server7',789.01), ('Server8',890.12), ('Server9',901.23), ('Server10',1023.35) select GroupNumber, sum(TotalSizeGB) as TotalSizeGB from ( select ServerName, sum(TotalSizeGB) as TotalSizeGB, ntile(3) over (order by newid()) as GroupNumber from ( select ServerName, TotalSizeGB from @Servers ) x group by ServerName ) y group by GroupNumber

The expected result here will be three groups of 2000 GB each. I expect it to be not exact, but at least close. If the grouping is on the server, it might look like this:

 ServerName TotalSizeGB GroupNumber Server10 1023.35 1 Server1 123.45 1 Server5 567.89 1 Server3 345.67 1 Server4 456.78 2 Server7 789.01 2 Server6 678.90 2 Server2 234.56 3 Server9 901.23 3 Server8 890.12 3

If I took the amount for the group, it would look like this:

 GroupNumber TotalSizeGB 1 2060.36 2 1924.69 3 2025.91

+6

sql-server sql-server-2012

Filamentunities Jul 26 '13 at 19:50

source share

4 answers

This task is actually scientific ( the packaging problem or its type), and math.stackexchange may be better suited :)

My solution is implemented in two stages (there are so many problems with optimization) - find the initial solution and try to clarify it.

Original solution:

 ServerName GroupNo TotalSizeGB ---------- ----------- ----------- Server1 3 123.45 Server2 3 234.56 Server3 2 345.67 Server4 1 456.78 Server5 2 567.89 Server6 1 678.90 Server7 3 789.01 Server8 3 890.12 Server9 1 901.23 Server10 2 1023.35 GroupNo GroupSizeGb ----------- ----------- 1 2036.91 2 1936.91 3 2037.14

Optimized:

 ServerName GroupNo TotalSizeGB ---------- ----------- ----------- Server1 3 123.45 Server2 3 234.56 Server3 2 345.67 Server4 1 456.78 Server5 3 567.89 Server6 1 678.90 Server7 2 789.01 Server8 2 890.12 Server9 1 901.23 Server10 3 1023.35 GroupNo GroupSizeGb ----------- ----------- 1 2036.91 2 2024.80 3 1949.25

Unfortunately, I was not able to configure it to SQLFiddle, since explicit transactions are used.

 set nocount on -- Parameters declare @nGroups int, -- Number of groups to split servers to @tolerance float, -- let say 0.0 ... 0.1 (0.1 mean that (+/-)10% deviation allowed from target group size) @nTries int, -- refinement tries 100, 1000, 10000 or as much as you can wait if you are not satisfied with initial solution @mFactor float, -- refinement param 0.0 ... 1.0 @tolerance2 float -- let say 0.1 ... 0.3 set @nGroups = 3 set @tolerance = 0 set @nTries = 1000 set @mFactor = 0.3 set @tolerance2 = 0.3 -- Initial Data create table #Servers (ID int identity, ServerName sysname, TotalSizeGB decimal (12,2), primary key clustered(ID)) insert into #Servers (ServerName, TotalSizeGB) values ('Server1',123.45), ('Server2',234.56), ('Server3',345.67), ('Server4',456.78), ('Server5',567.89), ('Server6',678.90), ('Server7',789.01), ('Server8',890.12), ('Server9',901.23), ('Server10',1023.35) create table #Groups (GroupNo int not NULL, primary key clustered (GroupNo)) insert into #Groups (GroupNo) select N from (select row_number() over (order by @@spid) from sys.all_columns) S(N) where N <= @nGroups create table #ServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered(ServerID)) create index #IX_GroupServers_GroupNo on #ServerGroups (GroupNo) declare @srvCnt int, @grSize decimal (12,2), @grNo int, @grSz decimal (12,2), @srvID int select @srvCnt = count(1), @grSize = sum(TotalSizeGB) / @nGroups from #Servers select @grSize as [Target approx. group size] -- Find initial solution while (select count(1) from #ServerGroups) < @srvCnt begin select top 1 @grNo = g.GroupNo from #Groups g left join #ServerGroups sg on sg.GroupNo = g.GroupNo left join #Servers s on s.ID = sg.ServerID group by g.GroupNo order by sum(s.TotalSizeGB) select @grSz = IsNull(sum(s.TotalSizeGB), 0) from #Groups g left join #ServerGroups sg on sg.GroupNo = g.GroupNo left join #Servers s on s.ID = sg.ServerID where g.GroupNo = @grNo select top 1 @srvID = ID from #Servers s where not exists (select 1 from #ServerGroups where ServerID = s.ID) order by abs(@grSize - @grSz - s.TotalSizeGB) insert into #ServerGroups (ServerID, GroupNo) values (@srvID, @grNo) end select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb from #Groups g join #ServerGroups sg on sg.GroupNo = g.GroupNo join #Servers s on s.ID = sg.ServerID group by g.GroupNo -- Refinement declare @fTarg float select @fTarg = sum(abs(case when abs(re) > @tolerance then re else 0 end)) from ( select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb from #Groups g join #ServerGroups sg on sg.GroupNo = g.GroupNo join #Servers s on s.ID = sg.ServerID group by g.GroupNo ) t cross apply (select (GroupSizeGb - @grSize)/@grSize re) p print @fTarg if @fTarg > 0 begin create table #MServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered (ServerID)) insert into #MServerGroups select ServerID, GroupNo from #ServerGroups while @nTries > 0 begin set @nTries = @nTries - 1 begin transaction ;with MS as ( select top (100*@mFactor) percent ServerID, GroupNo from #MServerGroups order by checksum(newid()) ) update msg set msg.GroupNo = case when msg.ServerID = tt.ServerID1 then tt.NewNo1 else tt.NewNo2 end from #MServerGroups msg join ( select ServerID1, NewNo1, ServerID2, NewNo2 from ( select MS.ServerID as ServerID1, SS.GroupNo as NewNo1, SS.ServerID as ServerID2, MS.GroupNo as NewNo2, row_number() over (partition by SS.ServerID order by @@spid) as rn from MS join #Servers s on s.ID = MS.ServerID cross apply ( select top 1 * from #Servers s2 join #MServerGroups ms2 on ms2.ServerID = s2.ID where s2.ID != MS.ServerID and ms2.GroupNo != MS.GroupNo and abs(s2.TotalSizeGB - s.TotalSizeGB)/s.TotalSizeGB < @tolerance2 order by checksum(newid()) ) SS ) t where rn = 1 )tt on msg.ServerID in (tt.ServerID1, tt.ServerID2) if @@rowcount = 0 begin rollback transaction continue; end declare @fT float select @fT = sum(abs(case when abs(re) > @tolerance then re else 0 end)) from ( select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb from #Groups g join #MServerGroups sg on sg.GroupNo = g.GroupNo join #Servers s on s.ID = sg.ServerID group by g.GroupNo ) t cross apply (select (GroupSizeGb - @grSize)/@grSize re) p if @fT < @fTarg begin set @fTarg = @ft print @fTarg -- the less this number, the better solution is commit transaction end else rollback transaction end update s set s.GroupNo = m.GroupNo from #MServerGroups m join #ServerGroups s on s.ServerID = m.ServerID select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb from #Groups g join #ServerGroups sg on sg.GroupNo = g.GroupNo join #Servers s on s.ID = sg.ServerID group by g.GroupNo drop table #MServerGroups end else print 'No refinement needed' drop table #Groups drop table #ServerGroups drop table #Servers

I suggest starting with @nTries = 0 and a reasonable @tolerance (e.g. 0.1, 0.05).

+1

i-one Jul 26 '13 at 23:07

source share

Check it out, hope this helps. I'm not sure what you mean by "uniform groups." But what I did here first allocates an even size to the group, and then if any remaining one then allocates it to the group, where there are more elements than the total size of the group. Instead of ntile, I would recommend that you select group numbers (perhaps use sp) and allocate a size for each server. but below may be good for the problem described. and note that I have not tested all the scripts.

  declare @TotalSizeGB decimal; select @TotalSizeGB = sum(TotalSizeGB) from @Servers; declare @Count int; select @Count = count(TotalSizeGB) from @Servers; declare @GroupSize int; select @GroupSize = 3; declare @NoofGroups int; select @NoofGroups = 3; declare @UnitSizeGB decimal Set @UnitSizeGB =(@TotalSizeGB/@Count)*@NoofGroups; Declare @Remainder decimal; Set @Remainder = @TotalSizeGB-(@UnitSizeGB*@NoofGroups) Select GroupNumber, CASE WHEN gcount = @GroupSize THEN @UnitSizeGB WHEN gcount > @GroupSize THEN @ UnitSizeGB+@Remainder END From ( Select GroupNumber,count(ServerName) as gcount, @UnitSizeGB as UnitSizeGB from( Select ServerName,ntile(@GroupSize) over (order by newid()) as GroupNumber from ( select ServerName, TotalSizeGB from @Servers ) x group by ServerName ) as d group by GroupNumber ) as ff

It will give a way out

  GroupNumber Size 1 2405 2 1803 3 1803

0

aads Jul 26 '13 at 21:12

source share

Here's a solution that generates the same results as @ i-one code, but maybe a little easier to understand (at least for me). I use "chunk" instead of "group" to avoid keyword conflicts.

The premise is as follows. To create n pieces of uniform size:

Sort all records in descending order.
Assign the first n records to your pieces by line number
Scroll through the remainder, always assigning the smallest fragment

I loaded the code into SQLFiddle, but it does not look like a table variable. Here is the link anyway .

 -- Source data: DECLARE @Servers TABLE (ServerName SYSNAME, TotalSizeGB DECIMAL (12,2)) INSERT INTO @Servers VALUES ('Server1',123.45), ('Server2',234.56), ('Server3',345.67), ('Server4',456.78), ('Server5',567.89), ('Server6',678.90), ('Server7',789.01), ('Server8',890.12), ('Server9',901.23), ('Server10',1023.35) -- Solution start DECLARE @ServersChunked TABLE ( ServerName SYSNAME, TotalSizeGB DECIMAL (12,2), RowNum INT, ChunkNo INT ); DECLARE @ChunkCount INT = 3, @MinRowNum INT, @SmallestChunk INT; -- Copy table into variable (skip this if the original table can be amended to include the RowNum and ChunkNo fields) INSERT INTO @ServersChunked SELECT *, RowNum = ROW_NUMBER() OVER (ORDER BY TotalSizeGB DESC), ChunkNo = NULL FROM @Servers -- Assign the initial chunks to largest tables UPDATE @ServersChunked SET ChunkNo = RowNum WHERE RowNum <= @ChunkCount -- Assign chunks to remaining tables WHILE EXISTS (SELECT 1 FROM @ServersChunked WHERE ChunkNo IS NULL) BEGIN -- Find the next table (by descending row count) SELECT @MinRowNum = MIN(RowNum) FROM @ServersChunked WHERE ChunkNo IS NULL -- Find the smallest chunk SELECT TOP 1 @SmallestChunk = ChunkNo FROM @ServersChunked WHERE ChunkNo IS NOT NULL GROUP BY ChunkNo ORDER BY Sum(TotalSizeGB) ASC -- Assign the table to the chunk UPDATE @ServersChunked SET ChunkNo = @SmallestChunk WHERE RowNum = @MinRowNum END

Here are the results:

 ChunkNo SumTotalSizeGB 1 1936.91 2 2036.91 3 2037.14

0

maccaroo Oct 17 '17 at 3:45

source share

Bogdan sahlean · Accepted Answer · 2013-07-26T21:34:54+0000

 SELECT * FROM( SELECT y.TotalSizeGB, CASE WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=0 THEN 2 WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=1 THEN 1 WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=2 THEN 0 ELSE y.PseudoGrpNumber END GrpNumber FROM( SELECT x.ServerName, x.TotalSizeGB, (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))%3 PseudoGrpNumber, (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))/3 AnotherGrp, ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC) RowNum FROM @Servers x )y )z PIVOT( SUM(z.TotalSizeGB) FOR z.GrpNumber IN([0],[1],[2]) ) pvt;

Results:

 0 1 2 ------- ------- ------- 2048.02 1925.80 2037.14

Some explanations:

The idea is to sort the data descending on the TotalSizeGB column. Then every 3 consecutive lines are grouped together ( AnotherGrp column), first in DESC order, and then in ASC order ( PseudoGroNumber and GrpNumber ). If he executed the SELECT * FROM () y derivation table, the results would be as follows:

 ServerName TotalSizeGB PseudoGrpNumber AnotherGrp GrpNumber RowNum ---------- ------------ --------------- ---------- --------- ------ Server10 1023.35 0 1 0 1 Server9 901.23 1 1 1 2 Server8 890.12 2 1 2 3 Server7 789.01 0 2 2 4 Server6 678.90 1 2 1 5 Server5 567.89 2 2 0 6 Server4 456.78 0 3 0 7 Server3 345.67 1 3 1 8 Server2 234.56 2 3 2 9 Server1 123.45 0 4 2 10

Creating groups with a uniform distribution in the aggregate

More articles: