You can use the DatastoreInputReader in the map function to find out if property1 is really in csv: Reading from csv will be very slow every time, what you can do is use memcache to provide this information after it is read only once from the Datastore's own model. To populate the data warehouse model, I would recommend using property1 as the user identifier of each row, so the query is straightforward. You would only update the Datastore for those values ββthat actually change and use the mutation pool to make it executive (op.db.Put ()). I leave you a pseudo-code (sorry ... I only have in python) about how the different parts will look, I also recommend that you read this article about Mapreduce in the Google App Engine: http://sookocheff.com/posts/2014 -04-15-app-engine-mapreduce-api-part-1-the-basics /
#to get the to_dict method from google.appengine.ext import ndb from mapreduce import operation as op from mapreduce.lib import pipeline from mapreduce import mapreduce_pipeline class TouchPipeline(pipeline.Pipeline): """ Pipeline to update the field of entities that have certain condition """ def run(self, *args, **kwargs): """ run """ mapper_params = { "entity_kind": "yourDatastoreKind", } yield mapreduce_pipeline.MapperPipeline( "Update entities that have certain condition", handler_spec="datastore_map", input_reader_spec="mapreduce.input_readers.DatastoreInputReader", params=mapper_params, shards=64) class csvrow(ndb.Model):
source share