Top-k pick / merge

Question

Top-k pick / merge

I have n sorted lists (5 <n <300). These lists are quite long (300,000+ tuples). The choice of the top k individual lists is, of course, trivial - they are at the head of the lists.

Example for k = 2:

top2 (L1: [ 'a': 10, 'b': 4, 'c':3 ]) = ['a':10 'b':4]
top2 (L2: [ 'c': 5, 'b': 2, 'a':0 ]) = ['c':5 'b':2]

Where it gets interesting when I want to combine the top k across all sorted lists .

top2(L1+L2) = ['a':10, 'c':8]

Just combining the top k of a single list will not necessarily give the correct results:

top2(top2(L1)+top2(L2)) = ['a':10, 'b':6]

The goal is to reduce the required space and keep the sorted lists small.

top2(topX(L1)+topX(L2)) = ['a':10, 'c':8]

, k, , . : X, ?

. . .

top2(magic([L1,L2])) = ['a', 'c']

+3

algorithm database-design

tcurdt 20 '10 22:33

7

, - 10 , , . , 10 , ( , , 10 ).

, . 10 .

+1

Jerry Coffin 20 '10 22:41

n . , .
.
- .
.
-
4 .

, . n , 4 ( ). n .

0

swestrup 20 '10 22:43

, "a" , . , :

() :

(Re-) ( ). , . .
.
k . node ( ). node, k .
2 , ID.

. O (k) -. :

; . O (U) : total_count , U - .
O (n) , . U , U - . - . O (n) ( ).

. , . . . , , /. , /.

0

Leftium 22 '10 18:14

, .

, , . " " . .

, n = 100 , , 2, 200.

:

, .
L, .
, .
2, L k .

, k- , . , k, k ..

0

Leftium 01 . '10 7:26

mapreduce:

http://www.yourdailygeekery.com/2011/05/16/top-k-with-mapreduce.html

0

tcurdt 01 . '11 13:00

, , . :

['a':100, 'b':99, ...]
['c':90, 'd':89, ..., 'b':2]

k = 1 (.. ). "b" - , , , "b" "a".

Edit:

(, ), , , . k = 1, .

- , , . .

, , ( S). , - , , S . ( , , , - , ?)

If your hash map contains only one element, and its counter is not less than S, then you can stop processing lists and return this element as an answer. If distribution distribution plays well, this early exit can actually be triggered, so you don't need to process all the lists.

-1

Keith randall May 22, '10 at 5:34

source share

Leftium · Accepted Answer · 2010-05-24T04:57:54+0000

O (U), U - .. , , , , .

(: total_count). , , .
, . .

Top-k pick / merge

More articles: