How to store a continuous stream of data so that you can access the most common values?

The stream of stock updates comes in the form of a ticker, a buy / sell pair (for example, YHOO, 300) from the stock exchange (at a very high speed). We should always display the 5 best-selling stocks by volume at any given time. How would you store these values ​​in memory?

My idea is to use hashmap (Ticker-> Volume) to update our data store from the O (1) feed. Then create a volume-based Treemap to display the top O (1) stocks. But what I can't think of is an efficient way to synchronize Treemap data as values ​​in a hash map change (very often). Any suggestions?

+5
source share
4 answers

If the number of trades is an ever-increasing cumulative value (which seems to be), you can simply support:

  • Your hash file for updating the number of transactions in O (1).
  • A simple array of 5 elements representing the 5 top tickers along with their volume.

When a new update arrives, you will update your hash file in O (1), and if the new number of transactions is large enough to enter the top fifth, you will also update your array. Checking the array and updating it can be done in O (1). You can also use binary search if you do not want to compare the new volume with all 5 values.

You cannot do much better than checking the average log (5).


If the number of transactions is not more monotonous (which seems unlikely, but it can happen if you want to consider the number of transactions made in a limited period of time), you can also support:

  • Your hash (Ticker → Volume) for saving updates on ticker volumes.
  • For the top five you need to use a bunch sorted by decreasing volumes.

When an update arrives, you will:

  • Get the previous value (if any) of the volume for the ticker from the hash map in O (1)
  • Remove the old value (if any) from the heap in O (log n) by looking at the old volume
  • Add new value to heap in O (log n)
  • Update new value in the hash map in O (1)

The overall complexity is O (log n), where "n" is the number of tickers.

Even if you need the first 5 tickers, when the volume of one ticker falls, you need to find the 6th element, which will become the 5th, and the 7th element .. and so on. I do not think that you can go much lower than O (log n) in complexity if the number of transactions is not monotonous.

+1
source

We can use a heap data structure where the root is the most sold stock. His child is the second most sold. Indigenous great children will give 4th and 5th

0
source

These are three options that I can think of:

  • Use the bag to collect stock symbols and cycle through it to calculate frequency. After the final call every day, reset the bag. This will work, but I doubt that you really want statistics a day at a time.
  • I think you need a window for about a minute for this to really work. In this case, I would save the transactions in dequeue . Insert new trades in front, trim old ones from the back. Freeze this time to count the first five. This works, but if something heats up, it can be difficult to predict how large the queue size you may need.
  • An alternative approach is to use a kind of splay tree . But I'm not sure if this is better than just showing the last five deals.
0
source

Source: https://habr.com/ru/post/1210078/


All Articles