How to efficiently create logical subsets of data in many mapping tables?

Question

How to efficiently create logical subsets of data in many mapping tables?

I have a relationship between invoices and credit card transactions that I am trying to match amounts together. The best way to think about a problem is to present TransactionInvoiceMap as a bipartite graph. For each connected subgraph, find the total number of all invoices and the total amount of all transactions within this subgraph. In my query, I want to return the values calculated for each of these subgraphs, together with the identifiers of the transactions with which they are associated. The totals for related transactions must be identical.

More explicitly considering the following transactions / invoices

Table: TransactionInvoiceMap TransactionID InvoiceID 1 1 2 2 3 2 3 3 Table: Transactions TransactionID Amount 1 $100 2 $75 3 $75 Table: Invoices InvoiceID Amount 1 $100 2 $100 3 $50

my desired result

 TransactionID TotalAsscTransactions TotalAsscInvoiced 1 $100 $100 2 $150 $150 3 $150 $150

Note that invoices 2 and 3 and transactions 2 and 3 are part of a logical group.

Here's a solution (simplified, name changed) that seems to work, but very slow. It’s hard for me to figure out how to optimize this, but I think that this will eliminate the subqueries in TransactionInvoiceGrouping. Feel free to suggest something radically different.

 with TransactionInvoiceGrouping as ( select -- Need an identifier for each logical group of transactions/invoices, use -- one of the transaction ids for this. m.TransactionID, m.InvoiceID, min(m.TransactionID) over (partition by m.InvoiceID) as GroupingID from TransactionInvoiceMap m ) select distinct g.TransactionID, istat.InvoiceSum as TotalAsscInvoiced, tstat.TransactionSum as TotalAsscTransactions from TransactionInvoiceGrouping g cross apply ( select sum(ii.Amount) as InvoiceSum from (select distinct InvoiceID, GroupingID from TransactionInvoiceGrouping) ig inner join Invoices ii on ig.InvoiceID = ii.InvoiceID where ig.GroupingID = g.GroupingID ) as istat cross apply ( select sum(it.Amount) as TransactionSum from (select distinct TransactionID, GroupingID from TransactionInvoiceGrouping) ig left join Transactions it on ig.TransactionID = it.TransactionID where ig.GroupingID = g.GroupingID having sum(it.Amount) > 0 ) as tstat

+6

tsql aggregate-functions sql-server-2008 many-to-many

Samantha branham Jul 13 '12 at 18:51

source share

2 answers

If I understood the question correctly, I think you are trying to find the minimum transaction ID for each invoice, and I used the ranking function to do the same.

 WITH TransactionInvoiceGrouping AS ( SELECT -- Need an identifier for each logical group of transactions/invoices, use -- one of the transaction ids for this. m.TransactionID, m.InvoiceID, ROW_NUMBER() OVER (PARTITION BY m.InvoiceID ORDER BY m.TransactionID ) AS recno FROM TransactionInvoiceMap m ) SELECT g.TransactionID, istat.InvoiceSum AS TotalAsscInvoiced, tstat.TransactionSum AS TotalAsscTransactions FROM TransactionInvoiceGrouping g CROSS APPLY( SELECT SUM(ii.Amount) AS InvoiceSum FROM TransactionInvoiceGrouping ig inner JOIN Invoices ii ON ig.InvoiceID = ii.InvoiceID WHERE ig.TransactionID = g.TransactionID AND ig.recno = 1 ) AS istat CROSS APPLY( SELECT sum(it.Amount) AS TransactionSum FROM TransactionInvoiceGrouping ig LEFT JOIN transactions it ON ig.TransactionID = it.TransactionID WHERE ig.TransactionID = g.TransactionID AND ig.recno = 1 HAVING SUM(it.Amount) > 0 ) AS tstat WHERE g.recno = 1

0

Yoosaf abdulla Jul 16 '12 at 17:30

source share

Tim lehner · Accepted Answer · 2012-07-16T21:26:07+0000

I implemented the solution in a recursive CTE :

 ;with TranGroup as ( select TransactionID , InvoiceID as NextInvoice , TransactionID as RelatedTransaction , cast(TransactionID as varchar(8000)) as TransactionChain from TransactionInvoiceMap union all select g.TransactionID , m1.InvoiceID , m.TransactionID , g.TransactionChain + ',' + cast(m.TransactionID as varchar(11)) from TranGroup g join TransactionInvoiceMap m on g.NextInvoice = m.InvoiceID join TransactionInvoiceMap m1 on m.TransactionID = m1.TransactionID where ',' + g.TransactionChain + ',' not like '%,' + cast(m.TransactionID as varchar(11)) + ',%' ) , RelatedTrans as ( select distinct TransactionID, RelatedTransaction from TranGroup ) , RelatedInv as ( select distinct TransactionID, NextInvoice as RelatedInvoice from TranGroup ) select TransactionID , ( select sum(Amount) from Transactions where TransactionID in ( select RelatedTransaction from RelatedTrans where TransactionID = t.TransactionID ) ) as TotalAsscTransactions , ( select sum(Amount) from Invoices where InvoiceID in ( select RelatedInvoice from RelatedInv where TransactionID = t.TransactionID ) ) as TotalAsscInvoiced from Transactions t

There is probably a place for optimization (including naming objects on my part!), But I believe that I have at least the right solution, which will collect all the possible transaction-account relationships for inclusion in the calculations.

I was not able to get the existing solutions on this page to give the desired OP result, and they became uglier when I added more test data. I am not sure if the OP has posted a “slow” solution as indicated. It is very possible that I misinterpreted the question.

Additional Information:

I often saw that recursive queries can be slow when working with large datasets. Perhaps this may be the subject of another question. In this case, all that needs to be tried on the SQL side may be to limit the range (add where clauses), index the base tables, first select the CTE in the temp table, index this temporary table, and think about the best stop state for the CTE. .. but profile first, of course.

How to efficiently create logical subsets of data in many mapping tables?

More articles: