I have 3 parts of this question:
I have an application in which users create objects that other users can update within 5 minutes. After 5 minutes, the objects lose time and are invalid. I store objects as objects. To timeout, I have a cron job that runs once per minute to clear expired objects.
In most cases, I do not have active objects. In this case, the mapreduce handler checks the entity that it receives, and does nothing, if it is not active, does not write. However, my free quota for writing to the data warehouse ends with mapreduce calls in about 7 hours. According to my rough calculations, it seems that only starting mapreduce calls ~ 120 write / call. (Rough math, 60 calls / hour * 7 hours = 420 calls, 50,000 options / 420 calls ~ 120 entries / calls)
Q1: Can anyone verify that only triggered mapreduce ~ 120 datastore triggers write?
To get around this, I check the data store before starting with mapreduce:
def cronhandler(): count = model.all(keys_only=True).count(limit=1000) if count: shards = (count / 100) + 1; from mapreduce import control control.start_map("Timeout open objects", "expire.maphandler", "expire.OpenOrderInputReader", {'entity_kind' : 'model'}, shard_count=shards) return HttpResponse()
Q2: Is this the best way to avoid writing data created using mapreduce? Is there a better way to configure mapreduce to avoid extraneous entries? I thought this was possible with the best custom InputReader
Q3: I assume that more fragments lead to the emergence of more extraneous data warehouse records from mapreduce accounting. Is the shard limit the expected number of objects that I need to write accordingly?