Dynamic query language using MapReduce on Appengine

We currently have an appengine (java) application with millions of objects. We do a lot of reporting using a map that reduces the + cron frames to control panels, etc.

However, we would like to be able to run adhoc requests across our entire dataset. The way we do it now is to write mapreduce, expand, run mapreduce, see the results. We do not need to take a deployment step. That is, just go to some admin interface, specify our request and, possibly, some user code to perform post-processing, and then view the results. We would make a lot more special requests if we did not have to deploy them every time.

Has anyone done something like this? What did you study? Any good strategies?

+4
source share
1 answer

This is a Python example, but I'm sure you can do the same with Java. One solution if you just want to consider objects as filters. You can create a handler that processes filters from mapreduce.yaml

- name: Query on Actors mapper: handler: mapper_api.query_process input_reader: google.appengine.ext.mapreduce.input_readers.DatastoreInputReader params: - name: entity_kind value: common.models.Actor - name: filters value: age<27, name=toto 

Then in your mapper_api.py you have to explode and process each filter:

 def query_process(entity): ctx = context.get() pms = ctx.mapreduce_spec.mapper.params filters = pms['filters'] if match(entity, filters): yield op.counters.Increment("matched") 

So, now in your / mapreduce you can select mapper Query on Actors and pass some filters to it.

+2
source

Source: https://habr.com/ru/post/1398719/


All Articles