While a loop to delete data in the data warehouse

I tried to clear and change the code in the answer here for my needs, where I want to remove only from Model Reservations for data records until the date expressed in get, like yy,mm,dd .

If I correctly anticipate the action cleanTable/2012/10/5 against routing ('/cleanTable/([\d]+)/([\d]+)/([\d]+)', CleanTable) , then my code will delete no more than 50 (10 * nlimit) data records.

Btw, the author of the source code (which is most likely no longer subscribing to SO), claimed that his main trick for executing this code was to "enable redirection in html instead of using self.redirect".

I am not familiar with raise Exception and the like, but my instinct should be to add a raise Exception or raise StopIteration loop to a for loop after it turns into a while loop. But it is not clear to me whether it is worth raising the StopIteration exception, the iteration stops or more is required. Also, I don't know how to rethink, so the html ends smoothly on early exit.

 class CleanTable(BaseHandler): def get(self, yy,mm,dd): nlimit=5 iyy=int(yy) imm=int(mm) idd=int(dd) param=date(iyy,imm,idd) q=Reservations.all(keys_only=True) q.filter("date < ", dt(iyy,imm,idd)) results = q.fetch(nlimit) self.response.headers['Content-Type'] = 'text/plain' self.response.out.write(""" <html> <meta HTTP-EQUIV="REFRESH" content="url=http://yourapp.appspot.com/cleanTable"> <body>""") try: for i in range(10): db.delete(results) results = q.fetch(nlimit, len(results)) for r in results: logging.info("r.name: %s" % r.name) self.response.out.write("<p> "+str(nlimit)+" removed</p>") self.response.out.write(""" </body> </html>""") except Exception, inst: logging.info("inst: %s" % inst) self.response.out.write(str(inst)) 
+4
source share
2 answers

This is not the best approach to clean your models. The best approach would be to get all the keys of your entities and create Task Queues . Each queue receives a batch of keys for objects that need to be changed.

Another approach would also be to create a cron job that will query the x number of the oldest modified objects, fix them, and then save them back.

Finally, if your number of objects is so large, you can also consider using Backends .

Hope this helps.

0
source

Here is my upgrade procedure and it will convert 500,000 objects. Be sure to run it on the backend instance (you can target the queue to the backend instance). Note that I am using a cursor, the only way you can iterate over data sequentially (never use an offset!).

 Queue queue = QueueFactory.getQueue("grinderQueue"); queue.add(TaskOptions.Builder.withPayload(new DeferredTask() { //lets generate private static final long serialVersionUID = 1L; @Override public void run() { String cursor = null; boolean done = false; Date now = new Date(1346763868L * 1000L); // 09/04/2012 while(!done) { DatastoreService datastore = DatastoreServiceFactory.getDatastoreService(); Query query = new Query("Venue"); query.setFilter(new FilterPredicate("timeOfLastUpdate", Query.FilterOperator.LESS_THAN,now)); PreparedQuery pq = datastore.prepare(query); FetchOptions fetchOptions = FetchOptions.Builder.withLimit(1000); if(cursor != null) fetchOptions.startCursor(Cursor.fromWebSafeString(cursor)); QueryResultList<Entity> results = pq.asQueryResultList(fetchOptions); List<Entity> updates = new ArrayList<Entity>(); List<Entity> oldVenueUpdates = new ArrayList<Entity>(); int tuples = 0; for(Entity en : results) { tuples++; try { if(en.getProperty(Venue.VENUE_KEY) == null) continue; Entity newVenue = new Entity("CPVenue",(String)en.getProperty(Venue.VENUE_KEY)); newVenue.setPropertiesFrom(en); newVenue.removeProperty("timeOfLastVenueScoreCalculation"); newVenue.removeProperty("actionsSinceLastVenueScoreCalculation"); newVenue.removeProperty("venueImageUrl"); newVenue.removeProperty("foursquareId"); newVenue.setProperty("geoCell", GeoCellCalculator.calcCellId(Double.valueOf((String)en.getProperty("lng")), Double.valueOf((String)en.getProperty("lat")),8)); newVenue.setProperty(Venue.TIME_SINCE_LAST_UPDATE, new Date()); updates.add(newVenue); Venue v = new Venue(newVenue); //Set timestamp on Venue en.setProperty("timeOfLastUpdate", now); oldVenueUpdates.add(en); }catch(Exception e) { logger.log(Level.WARNING,"",e); } } done = tuples == 0; tuples = 0; if(results.getCursor() != null) cursor = results.getCursor().toWebSafeString(); else done = true; System.out.println("Venue Conversion LOOP updates.. " + updates.size() + " cursor " + cursor); datastore.put(updates); datastore.put(oldVenueUpdates); } System.out.println("Venue Conversion DONE"); }})); 
0
source

Source: https://habr.com/ru/post/1438277/


All Articles