Get the top 10 products for each category.

Question

Get the top 10 products for each category.

I have a query that looks something like this:

SELECT t.category, tc.product, tc.sub-product, count(*) as sales FROM tg t, ttc tc WHERE t.value = tc.value GROUP BY t.category, tc.product, tc.sub-product;

Now in my request I want to get the 10 best products for each category (top for sales), and for each category I need the 5th subcategory (top for sales)

You can consider the statement of the problem as follows:

Get the top 10 products for each sales category and for each product you get the top 5 sales offal.

Here the category may be:
The product may be a Harry Porter book.
sub productcan may be HarryPorter 5 series

Example input format

 category |product |subproduct |Sales [count (*)] abc test1 test11 120 abc test1 test11 100 abc test1 test11 10 abc test1 test11 10 abc test1 test11 10 abc test1 test11 10 abc test1 test12 10 abc test1 test13 8 abc test1 test14 6 abc test1 test15 5 abc test2 test21 80 abc test2 test22 60 abc test3 test31 50 abc test3 test32 40 abc test4 test41 30 abc test4 test42 20 abc test5 test51 10 abc test5 test52 5 abc test6 test61 5 | | | bcd test2 test22 10 xyz test3 test31 5 xyz test3 test32 3 xyz test4 test41 2

The output will be "

 top 5 rf for (abc) -> abc,test1(289) abc,test2 (140), abc test3 (90), abc test4(50) , abc test5 (15) top 5 rfm for (abc,test1) -> test11(260),test12(10),test13(8),test14(6),test15(5) and so on

My query does not work because the results are really huge. I read about oracle analytic functions like rank. Can someone help me modify this query using analytic functions. Any other approach may also work.

I mean this is http://www.orafaq.com/node/55 . But failed to get the correct SQL query for this.

Any help would be appreciated. I kind of got stuck for 2 days on this :(

+4

sql oracle mysql rank

Topcoder Feb 11 '11 at 8:19

source share

2 answers

These are guesses, but you can probably start with something like this:

 drop table category_sales;

Some test data:

 create table category_sales ( category varchar2(14), product varchar2(14), subproduct varchar2(14), sales number ); begin for cate in 1 .. 10 loop for prod in 1 .. 20 loop for subp in 1 .. 30 loop insert into category_sales values ( 'Cat ' || cate, 'Prod ' || cate||prod, 'Subp ' || cate||prod||subp, trunc(dbms_random.value(1,30 + cate - prod + subp)) ); end loop; end loop; end loop; end; /

Actual request:

 select * from ( select category, product, subproduct, sales, category_sales, product_sales, top_subproduct, -- Finding best products within category: dense_rank () over ( partition by category order by product_sales desc ) top_product from ( select -- Finding the best Subproducts within -- category and product: dense_rank () over ( partition by category, product order by sales desc ) top_subproduct, -- Finding the sum(sales) within a -- category and prodcut sum(sales) over ( partition by category, product ) product_sales, -- Finding the sum(sales) within -- category sum(sales) over ( partition by category ) category_sales, category, product, subproduct, sales from category_sales ) ) where -- Only best 10 Products top_product <= 10 and -- Only best 5 subproducts: top_subproduct <= 5 -- "Best" categories first: order by category_sales desc, top_product desc, top_subproduct desc;

In this query, the category_sales column returns the sum of sales of the category in whose record it is returned. This means that each entry in the same category has the same category_sales . This column is needed to organize the result set with the best sales categories ( order by ... category_sales desc ).

Similarly, product_sales represents the sum of sales for a combination of product categories. This column is used to search for the best n (here: 10) products in each category ( where top_product <= 10 ).

The top_product column top_product "created" with the analytic function dense_rank() over... For the best product in the category it is 1, for the second it is best 2 and so on (hence where top_product <= 10 .

The column top_suproduct evaluated in the same way as top_product (i.e. with dense_rank ).

0

René nyffenegger Feb 11 '11 at 9:32

source share

RichardTheKiwi · Accepted Answer · 2011-02-11T09:30:05+0000

There are probably reasons not to use analytic functions, but to use analytic functions :

 select am, rf, rfm, rownum_rf2, rownum_rfm from ( -- the 3nd level takes the subproduct ranks, and for each equally ranked -- subproduct, it produces the product ranking select am, rf, rfm, rownum_rfm, row_number() over (partition by rownum_rfm order by rownum_rf) rownum_rf2 from ( -- the 2nd level ranks (without ties) the products within -- categories, and subproducts within products simultaneosly select am, rf, rfm, row_number() over (partition by am order by count_rf desc) rownum_rf, row_number() over (partition by am, rf order by count_rfm desc) rownum_rfm from ( -- inner most query counts the records by subproduct -- using regular group-by. at the same time, it uses -- the analytical sum() over to get the counts by product select tg.am, ttc.rf, ttc.rfm, count(*) count_rfm, sum(count(*)) over (partition by tg.am, ttc.rf) count_rf from tg inner join ttc on tg.value = ttc.value group by tg.am, ttc.rf, ttc.rfm ) X ) Y -- at level 3, we drop all but the top 5 subproducts per product where rownum_rfm <= 5 -- top 5 subproducts ) Z -- the filter on the final query retains only the top 10 products where rownum_rf2 <= 10 -- top 10 products order by am, rownum_rf2, rownum_rfm;

I used rownum instead of rank so you never get ties, or, in other words, the bonds will be randomly resolved. This also does not work if the data is not dense (less than 5 offal in any of the top 10 products - offal from some other products may be shown instead). But if the data is dense (large database), the request should work fine.

Below are two data passes, but in each case returns the correct results. Again, this is a query without links.

 select am, rf, rfm, count_rf, count_rfm, rownum_rf, rownum_rfm from ( -- next join the top 10 products to the data again to get -- the subproduct counts select tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf, count(*) count_rfm, ROW_NUMBER() over (partition by tg.am, tg.rf order by 1 desc) rownum_rfm from ( -- first rank all the products select tg.am, tg.value, ttc.rf, count(*) count_rf, ROW_NUMBER() over (order by 1 desc) rownum_rf from tg inner join ttc on tg.value = ttc.value group by tg.am, tg.value, ttc.rf order by count_rf desc ) tg inner join ttc on tg.value = ttc.value and tg.rf = ttc.rf -- filter the inner query for the top 10 products only where rownum_rf <= 10 group by tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf ) X -- filter where the subproduct rank is in top 5 where rownum_rfm <= 5 order by am, rownum_rf, rownum_rfm;

columns:

 count_rf : count of sales by product count_rfm : count of sales by subproduct rownum_rf : product rank within category (rownumber - without ties) rownum_rfm : subproduct rank within product (without ties)

Get the top 10 products for each category.

More articles: