What is the fastest algorithm for finding k-maximal elements of a sequence using stl containers

Question

What is the fastest algorithm for finding k-maximal elements of a sequence using stl containers

I need the fastest algorithm for finding k-maximal elements of a sequence using C ++ of any stl containers. My ideas: use a list or vector, sort them, get the first k-elements. in this case, the number of operations is n * log (n). n is the number of elements. But I think this is not the best.

+4

c ++ algorithm stl

Alex Apr 05 '11 at 21:07

source share

8 answers

sehe · Answer 1 · 2011-04-05T21:30:27+0000

A method using std :: partial_sort might be the best answer.

Also pay attention to std::nth_element , which simply get the element at the nth position on the right (and divide the sequence into "less" before and "more" after this nth element

So, if you are really interested in only the first elements of k (without any specific internal ordering), then nth_element definitely takes a biscuit

6502 · Answer 2 · 2011-04-05T21:30:12+0000

I think the best approach is to use a vector to hold the result and build a bunch in it when you go through the input. Once the heap size reaches k , you will no longer grow it (and just continue foaming, starting at position k-1 ).

When the entry is finished, the heap is already the answer (suppose you were not asked to return them in order).

If, however, k > n/2 , then it is probably better to keep those that received bubbles from a heap of size n - k (this assumes, however, that you know the number of elements n and not only k in advance).

orlp · Answer 3 · 2011-04-05T21:16:18+0000

Assuming random unsorted data, I think the quickest way is to create a sorted linked list by going through the original container and for each element, if it is greater than the lowest value in the result vector, pin it (in the correct sorted location). If the list now contains more, then k elements remove the lowest value.

In the worst case (sorted source container) means O(k*n) , the best case is O(n) .

Mark b · Answer 4 · 2011-04-05T21:23:24+0000

EDIT: if you don't need the order of the maximum elements, you can use nth_element to split the vector, as @sehe noted. This is O(n) .

Otherwise, if you care about the order:

Use std::partial_sort for the vector of your data to sort the first k elements. This will work in O(n log k) .

Copy your data one by one and pull out the k elements. This is still O(n log k) , but I believe with higher constants.

If performance is a concern, then both approaches and faster use of your data set.

akappa · Answer 5 · 2011-04-05T21:26:39+0000

Using QuickSelect , you can find them in O (n) in the worst case, using the "smart" rotation selection described on the wiki page (unsorted: these are the elements preceding the kth element in the final order induced by the algorithm).

You cannot defeat O (n) (because you need to “touch” all the elements to make sure that your chosen one is kth), so this is the best you can achieve.

Keshav saharia · Answer 6 · 2011-04-05T21:42:57+0000

Unfortunately, I cannot find the source code that I wrote for this, but check this:

http://en.wikipedia.org/wiki/Radix_sort

ryaner · Answer 7 · 2011-04-06T00:06:43+0000

I would use std::make_heap to create a heap from your array or vector of values, which will lead to O(n) time. Then you can repeatedly check the top element of the heap and insert it for k times (using std::pop_heap ), which will lead to O(k * log n) time.

The overall execution complexity will be O(k * log n) , which is better than O (n * log k) , because n is greater than k. As you asked, they are all already available in <algorithm> , so the implementation is very simple.

João Sousa Pinto · Answer 8 · 2015-09-30T17:26:18+0000

This can be done in linear time using the selection algorithm , which takes O(n) in the worst case, and then passes through the vectors once and taking exactly those elements that are at least as large as statistics of the (nk) th order (and counting the number of elements you took so that you take exactly k and no more). However, Cppreference says that std::nth_element takes an average time, not the worst case. I will explain how to do this a little slower, but probably easier using heaps. This solution takes O(max(n,k*log(k))) time O(max(n,k*log(k))) in the worst case, to extract the top elements k vector of size n .

You start by creating a max heap with all n elements that take O (n) time with std::make_heap .

Now we want to extract the top elements of k from this heap, but we need to be smart when we do this. If we extract the maximum element k times, it will cost us O(log(n)) every time, so O(k*log(n)) as a whole, which does not achieve our goal.

Instead, we will not touch this heap of size n and create a separate maximum heap, which I call the "waiting heap". This heap of expectations begins only with the maximum element of the source heap, and to get the top k elements, you repeat the following procedure k times: extract the top element from the waiting heap and add your two descendants to it. The size of the expected heap increases by one at each step, so it is limited to k . Since we are rendering the k and 2k inserts (assuming you use the binary heap), this will cost us more than 3*k*log(k) .

What is the fastest algorithm for finding k-maximal elements of a sequence using stl containers

More articles: