Distributed Storm Caching

How to store temporary data in an Apache loop?

In a storm topology, a bolt must gain access to previously processed data.

Eg: if the bolt processes varaiable1 with result as 20 at 10:00 AM. 

and again varaiable1 taken as 50 at 10:15 AM , then the result should be 30 (50-20)

later, if varaiable1 gets 70 , then the result should be 20 (70-50) at 10:30 .

How to achieve this functionality.

0
source share
4 answers

In short, you wanted to make calculations with microbearing in storms with tuples. First you need to define / find the key in the tuple set. Grouping fields (do not use grouping in random order) between bolts with this key. This ensures that the associated tuples will always send a downstream bolt to the same task for the same key. Define a collection of List / Map classes to keep the old values ​​and add a new value for the calculation, do not worry that they are thread safe between instances of different instances of the same bolt.

+2
source

I am afraid that today there is no such built-in functionality. But you can use any type of distributed cache, for example memcached or Redis. These caching solutions are very easy to use.

+1
source

There are several approaches, but it depends on your system requirements, the skills of your team and your infrastructure.

You can use Apache Cassandra to store events and pass the row key in the tuple so that the next bolt can receive it.

If your data is a time series in nature, then you might want to take a look at OpenTSDB or InfluxDB .

Of course, you could return to something like Software Transaction Memory, but I think that this will require a good amount of processing.

0
source

Uou can use CacheBuilder to store your data in an extended BaseRichBolt (put this in the preparation method):

 // init your cache. this.cache = CacheBuilder.newBuilder() .maximumSize(maximumCacheSize) .expireAfterWrite(expireAfterWrite, TimeUnit.SECONDS) .build(); 

Then at runtime, you can use the cache to see if you see this key entry or not. from there you can add your business logic:

 // if we haven't seen it before, we can emit it. if(this.cache.getIfPresent(key) == null) { cache.put(key, nearlyEmptyList); this.collector.emit(input, input.getValues()); } this.collector.ack(input); 
0
source

Source: https://habr.com/ru/post/986563/


All Articles