Cross two queries with different filters

I use Druid to monitor events on my website. Data can be presented as follows:

event_id | country | user_id | event_type ================================================ 1 | USA | id1 | visit 2 | USA | id2 | visit 1 | Canada | id3 | visit 3 | USA | id1 | click 1 | Canada | id4 | visit 3 | Canada | id3 | click 3 | USA | id2 | click 

I also defined aggregation to count events. I made requests to the Druid to present the data for event_id = 3 as follows:

Please note that visits are not related to event_id.

 country | visits | clicks =============================== USA | 4 | 2 Canada | 3 | 2 

I am currently using two topNResults queries with two different filters:

  • event_type = visit โ†’ to count visits in each country, regardless of event identifier.
  • event_id = 3

Of course, my data is much more than many others. topNResults api should have a threshold parameter that represents the maximum number of results I want to get as an answer.

The problem is that my threshold is less than the actual results, these two queries may not have the same results in countries.

I am currently combining overlap results on my server, but I am losing some results in countries and I am showing less than my threshold, although there are more results.

What can I do to optimize that I will always have the same countries for my threshold (without sending a list of countries returned from the first request to the second filter, I tried it and it was very slow)?

+6
source share
1 answer

The sound filter Aggregator will save you all the requests.
The filtered aggregator combines only the values โ€‹โ€‹that correspond to the dimensional filter.
The following query will do the trick in your case: After the druid groups all the events in different countries (since the measurement is a country), the aggregator filter will filter all the events in which it has the id value (e1, e2), and execute the aggregator count on filtered results.

 { ... "dimension":"country", ..., "aggregations": [ { "type" : "filtered", "filter" : { "type" : "selector", "dimension" : "event_id", "value" : ["1","2"] "type": "in" } "aggregator" : { "type" : "count", "name" : "count_countries" } } } ] } 

Take the table.

 event_id | country | user_id | event_type ================================================ 1 | USA | id1 | visit 2 | USA | id2 | visit 1 | Canada | id3 | visit 3 | USA | id1 | click 1 | Canada | id4 | visit 3 | Canada | id3 | click 3 | USA | id2 | click 

The druid will group the results by country.

  country | user_id | event_type | event_id ================================================ USA | id1 | visit | 1 USA | id2 | visit | 2 USA | id1 | click | 1 USA | id2 | click | 3 Canada | id3 | visit | 1 Canada | id4 | visit | 3 Canada | id3 | click | 3 

The aggregator filter will delete all event_id = 3 due to our filter ("value": ["1", "2"])

  country | user_id | event_type | event_id ================================================ USA | id1 | visit | 1 USA | id2 | visit | 2 USA | id1 | click | 1 Canada | id3 | visit | 1 

And return the following result (our aggregator is a simple calculation)

  country | count =================== USA | 3 Canada | 1 

Enjoy it!

+1
source

Source: https://habr.com/ru/post/1013424/


All Articles