What is the best practice for listening to events, grouping and sending it in batch mode?

Question

What is the best practice for listening to events, grouping and sending it in batch mode?

Let's say my system wants to listen to user click events and store them in the archive storage. I know where the event comes from (userId - about a hundred users) and which URL was clicked. (url - infinite variations)

class ClickEvent { String userId; String url; }

If my system potentially receives thousands of events per second, I don’t want to place a massive load in the storage, calling it once every time I visit clicks. Suppose the repository is an AWS S3-type repository or a data repository where it is good for storing fewer large repository files than sending tens of thousands of requests per second.

My approach currently is to use the GoogleGuava Cache library. (or just any cache with cache expiration support)

Suppose the key for the cache is userId and the value for the cache is List<url> .

Caching error -> Add entry to cache (userId, [url1])
Cache click -> I am adding a new URL to the list (userId, [url1, url2...])
The cache expires after a custom X minute from the time it was originally written, or after 10,000 URLs.
After the input expires, I insert the data into the repository, ideally reducing up to 10,000 small individual transactions to 1 large transaction.

I'm not sure if there is a “standard” or better way (or even a well-known library) to solve this problem, that is, to accumulate thousands of events per second and save them all in the storage / file / data store at the same time, instead of transferring high loads top down to downstream services. I feel that this is one of the common uses of a large data system.

+5

design design-patterns database-design bigdata

user482594 Jul 14 '17 at 8:03

source share

1 answer

Jonathan oron · Answer 1 · 2017-07-20T13:37:19+0000

I would create an eventModule class that receives these events and holds them in the queue. Make sure it's singleton so you can download it from several places in your code: https://sourcemaking.com/design_patterns/singleton

Then I would do these class type events and use the factory template to create them: https://sourcemaking.com/design_patterns/factory_method
Thus, if you need several types of events, your singleton will be able to handle all of them.

Finally, I would like eventModule to store them in local storage every X seconds. Every Y seconds (or Z-events in the queue) I will try to send them to the remote storage. If this worked, remove them from the queue.

This will give you more flexibility in the future as your application grows.

What is the best practice for listening to events, grouping and sending it in batch mode?

More articles: