Is there any massive data structure that can grow on both sides?

I am working on a small project for a high-performance computing course, so its effectiveness is a key issue.

Let's say that I have a vector of N floats, and I want to remove the smallest n elements and the largest n elements. There are two easy ways to do this:

A

sort in ascending order // O(NlogN) remove the last n elements // O(1) invert elements order // O(N) remove the last n elements // O(1) 

IN

 sort in ascending order // O(NlogN) remove the last n elements // O(1) remove the first n elements // O(N) 

In A, which inverts the order of the elements, it is necessary to swap all the elements, while in B, which removes the first n elements, it is necessary to move all the others in order to occupy the positions remaining empty. Using std :: remove will give the same problem.

If I could remove the first n elements for free, then solution B would be cheaper. This should be easy to achieve if instead of having a vector, i.e. An array with some empty space after vector::end() , I would have a container with some free space also before vector::begin() .

So, the question is, do some libraries (STL, Boost) already have an array-like (i.e., continuous memory, unrelated lists) that allows O (1) to insert / delete both sides of the array?

If not, do you think there are better solutions than creating such a data structure?

+6
source share
4 answers

Do you think you are using std::partition with a special functor, as shown below:

 #include <iostream> #include <vector> #include <algorithm> template<typename T> class greaterLess { T low; T up; public: greaterLess(T const &l, T const &u) : low(l), up(u) {} bool operator()(T const &e) { return !(e < low || e > up); } }; int main() { std::vector<double> v{2.0, 1.2, 3.2, 0.3, 5.9, 6.0, 4.3}; auto it = std::partition(v.begin(), v.end(), greaterLess<double>(2.0, 5.0)); v.erase(it, v.end()); for(auto i : v) std::cout << i << " "; std::cout << std::endl; return 0; } 

This way you remove the elements from your vector in O(N) .

+5
source

Try boost :: circle_buffer :

It supports random access iterators, inserts and deletes constant time at the beginning or end of the buffer, and compatibility with std algorithms.

Looking at the source , it seems (and logical) that the data is stored as a continuous block of memory.

The only caveat is that the buffer has a fixed capacity, and after exhaustion its elements will be overwritten. You can either detect such cases yourself, or manually change the buffer size, or use boost::circular_buffer_space_optimized with a large declared capacity, since it will not allocate it if you do not need it.

+2
source

To compress and grow the vector at both ends, you can use the idea of ​​slicing, reserving additional memory to speed it back and forth if effective growth is needed.

Just create a class with not only the length, but also the indices for the first and last elements and a vector of the appropriate size to create a data window in the base block of the stored floats. A C ++ class can provide built-in functions for things like deleting elements, accessing an array, searching for the nth maximum value, shifting slice values ​​up or down to insert new elements that support sorted order. If backup elements are not available, then the dynamic allocation of a new larger floating-point storage allows continued growth through a copy of the array.

The circular buffer is designed as a FIFO, with the addition of new elements at the end, removal at the front and the inability to insert in the middle, a self-defined class can also (trivially) support array index values ​​other than 0..N -1

Due to the locality of memory, avoiding excessive indirectness due to pointer chains and pipelining of substring calculations on a modern processor, a solution based on an array (or vector) is likely to be the most efficient, despite the fact that copying elements to inserts. Deque is suitable, but it cannot guarantee continuous storage.

Additional additional information. Examining the slicing classes allows you to find some plausible alternatives to evaluate:

A) std :: slice, which uses slice_arrays B) Increase the range of classes

I hope that this is the specific information that you hoped for, in general, a simpler solution is more convenient than complex. I would expect that slices and ranges on sorted data sets would be fairly common, such as filtering experimental data, where “outliers” are excluded as erroneous readings.

I think that a good solution should be - O (NlogN), 2xO (1), with any binary searches O (logN +1) to filter by remote values ​​instead of deleting a fixed number of small or large values; it is important that "O" is relatively fast, sometimes the O (1) algorithm may in practice be slower for practical N values ​​than O (N).

+2
source

as a complement to @ 402's answer, before partitioning the array, you will need to find the partition section, which you will need to find the nth smallest number and nth largest number in the unsorted array. There is a discussion of this question in SO: How to find the kth largest number in an unsorted array

There are several algorithms to solve this problem. Some of them are deterministic. O (N) - of which is a variation when searching for the median (median of medians). There are some non-deterministic algorithms with the O (N) middle case. A good source book to look for these algorithms is Introduction to Algorithms . Also in books such as

This way your code will work in O (N) time

+1
source

Source: https://habr.com/ru/post/969742/


All Articles