Top-k pick / merge

I have n sorted lists (5 <n <300). These lists are quite long (300,000+ tuples). The choice of the top k individual lists is, of course, trivial - they are at the head of the lists.

Example for k = 2:

top2 (L1: [ 'a': 10, 'b': 4, 'c':3 ]) = ['a':10 'b':4]
top2 (L2: [ 'c': 5, 'b': 2, 'a':0 ]) = ['c':5 'b':2]

Where it gets interesting when I want to combine the top k across all sorted lists .

top2(L1+L2) = ['a':10, 'c':8]

Just combining the top k of a single list will not necessarily give the correct results:

top2(top2(L1)+top2(L2)) = ['a':10, 'b':6]

The goal is to reduce the required space and keep the sorted lists small.

top2(topX(L1)+topX(L2)) = ['a':10, 'c':8]

, k, , . : X, ?

. . .

top2(magic([L1,L2])) = ['a', 'c']
+3
7

O (U), U - .. , , , , .

  • (: total_count). , , .
  • , . .
+1

, - 10 , , . , 10 , ( , , 10 ).

, . 10 .

+1
  • n . , .
  • .
  • - .
  • .
  • -
  • 4 .

, . n , 4 ( ). n .

0

, "a" , . , :

() :

  • (Re-) ( ). , . .
  • .
  • k . node ( ). node, k .
  • 2 , ID.

. O (k) -. :

  • ; . O (U) : total_count , U - .
  • O (n) , . U , U - . - . O (n) ( ).

. , . . . , , /. , /.

0

, .

, , . " " . .

, n = 100 , , 2, 200.

:

  • , .
  • L, .
  • , .
  • 2, L k .

, k- , . , k, k ..

0

, , . :

['a':100, 'b':99, ...]
['c':90, 'd':89, ..., 'b':2]

k = 1 (.. ). "b" - , , , "b" "a".

Edit:

(, ), , , . k = 1, .

- , , . .

, , ( S). , - , , S . ( , , , - , ?)

If your hash map contains only one element, and its counter is not less than S, then you can stop processing lists and return this element as an answer. If distribution distribution plays well, this early exit can actually be triggered, so you don't need to process all the lists.

-1
source

Source: https://habr.com/ru/post/1746482/


All Articles