Delete all data of its kind in the Google App Engine

I would like to destroy all data for a specific view in the Google App Engine. What is the best way to do this? I wrote a delete script (hack), but since there is so much data timeout after several hundred records.

+44
python google-app-engine
Sep 20 '08 at 17:34
source share
19 answers

The official answer from Google is that you need to remove fragments distributed across multiple requests. You can use AJAX, meta refresh or request your url from the script until there are no objects left.

+6
Sep 23 '08 at 2:44
source share

I am currently deleting objects by their key and seem to be faster.

from google.appengine.ext import db class bulkdelete(webapp.RequestHandler): def get(self): self.response.headers['Content-Type'] = 'text/plain' try: while True: q = db.GqlQuery("SELECT __key__ FROM MyModel") assert q.count() db.delete(q.fetch(200)) time.sleep(0.5) except Exception, e: self.response.out.write(repr(e)+'\n') pass 

from the terminal, I run curl -N http: // ...

+27
Jun 21 '09 at 11:41
source share
+23
Sep 18 '11 at 9:17 a.m.
source share

If I were a paranoid person, I would say that the Google App Engine (GAE) did not make it easier for us to delete data if we want. I'm going to skip the discussion of index sizes and translating them into 6 GB of data on 35 GB of memory (for payment). This is another story, but they have ways around this - to limit the number of properties for creating an index (automatically created indexes), etc.

The reason I decided to write this post is because I need to โ€œdestroyโ€ all my views in the sandbox. I read about it and finally came up with this code:

 package com.intillium.formshnuker; import java.io.IOException; import java.util.ArrayList; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import com.google.appengine.api.datastore.Key; import com.google.appengine.api.datastore.Query; import com.google.appengine.api.datastore.Entity; import com.google.appengine.api.datastore.FetchOptions; import com.google.appengine.api.datastore.DatastoreService; import com.google.appengine.api.datastore.DatastoreServiceFactory; import com.google.appengine.api.labs.taskqueue.QueueFactory; import com.google.appengine.api.labs.taskqueue.TaskOptions.Method; import static com.google.appengine.api.labs.taskqueue.TaskOptions.Builder.url; @SuppressWarnings("serial") public class FormsnukerServlet extends HttpServlet { public void doGet(final HttpServletRequest request, final HttpServletResponse response) throws IOException { response.setContentType("text/plain"); final String kind = request.getParameter("kind"); final String passcode = request.getParameter("passcode"); if (kind == null) { throw new NullPointerException(); } if (passcode == null) { throw new NullPointerException(); } if (!passcode.equals("LONGSECRETCODE")) { response.getWriter().println("BAD PASSCODE!"); return; } System.err.println("*** deleting entities form " + kind); final long start = System.currentTimeMillis(); int deleted_count = 0; boolean is_finished = false; final DatastoreService dss = DatastoreServiceFactory.getDatastoreService(); while (System.currentTimeMillis() - start < 16384) { final Query query = new Query(kind); query.setKeysOnly(); final ArrayList<Key> keys = new ArrayList<Key>(); for (final Entity entity: dss.prepare(query).asIterable(FetchOptions.Builder.withLimit(128))) { keys.add(entity.getKey()); } keys.trimToSize(); if (keys.size() == 0) { is_finished = true; break; } while (System.currentTimeMillis() - start < 16384) { try { dss.delete(keys); deleted_count += keys.size(); break; } catch (Throwable ignore) { continue; } } } System.err.println("*** deleted " + deleted_count + " entities form " + kind); if (is_finished) { System.err.println("*** deletion job for " + kind + " is completed."); } else { final int taskcount; final String tcs = request.getParameter("taskcount"); if (tcs == null) { taskcount = 0; } else { taskcount = Integer.parseInt(tcs) + 1; } QueueFactory.getDefaultQueue().add( url("/formsnuker?kind=" + kind + "&passcode=LONGSECRETCODE&taskcount=" + taskcount).method(Method.GET)); System.err.println("*** deletion task # " + taskcount + " for " + kind + " is queued."); } response.getWriter().println("OK"); } } 

I have over 6 million entries. It's a lot. I have no idea what it would cost to delete records (perhaps more economically so as not to delete them). Another alternative would be a delete request for the entire application (sandbox). But this is not real in most cases.

I decided to go with smaller groups of records (in a simple request). I know that I can go to 500 objects, but then I started to get very high failure rates (delete function).

My request is from the GAE team: add a function to delete all objects in one transaction.

+10
Dec 10 '09 at 17:41
source share

Presumably your hack was something like this:

 # Deleting all messages older than "earliest_date" q = db.GqlQuery("SELECT * FROM Message WHERE create_date < :1", earliest_date) results = q.fetch(1000) while results: db.delete(results) results = q.fetch(1000, len(results)) 

As you say, if there is enough data, you will click on the request timeout before it goes through all the records. You must re-call this request several times from the outside to ensure that all data is erased; easy enough to do, but hardly perfect.

The admin console does not seem to offer any help, because (in my own experience) it seems to allow you to list entities of a certain type and then delete them in stages.

During testing, I had to clear my database at startup to get rid of existing data.

I would conclude that Google works on the principle that the disk is cheap, and therefore data usually becomes orphans (indexes for redundant data are replaced), and not deleted. Given that a certain amount of data is currently available for each application (0.5 GB), this does not really help users who do not use the Google App Engine.

+9
Sep 20 '08 at 18:34
source share

Try using the App Engine Console , then you donโ€™t even have to deploy any special code

+9
Nov 14 '08 at 23:58
source share

I tried db.delete (results) and the App Engine Console, and none of them seem to work for me. Manually deleting records from the Data Viewer (increased limit to 200) did not work, since I downloaded more than 10,000 records. I finished writing this script

 from google.appengine.ext import db from google.appengine.ext import webapp from google.appengine.ext.webapp.util import run_wsgi_app import wsgiref.handlers from mainPage import YourData #replace this with your data class CleanTable(webapp.RequestHandler): def get(self, param): txt = self.request.get('table') q = db.GqlQuery("SELECT * FROM "+txt) results = q.fetch(10) self.response.headers['Content-Type'] = 'text/plain' #replace yourapp and YouData your app info below. self.response.out.write(""" <html> <meta HTTP-EQUIV="REFRESH" content="5; url=http://yourapp.appspot.com/cleanTable?table=YourData"> <body>""") try: for i in range(10): db.delete(results) results = q.fetch(10, len(results)) self.response.out.write("<p>10 removed</p>") self.response.out.write(""" </body> </html>""") except Exception, ints: self.response.out.write(str(inst)) def main(): application = webapp.WSGIApplication([ ('/cleanTable(.*)', CleanTable), ]) wsgiref.handlers.CGIHandler().run(application) 

The trick was to enable redirection in html instead of using self.redirect. I am ready to wait all night to get rid of all the data in my table. We hope that the GAE team will make it easier to drop tables in the future.

+7
Nov 27 '08 at 6:00
source share

The fastest and most efficient way to handle bulk deletion in Datastore is to use the new mapping API announced in the latest Google I / O.

If your choice language is Python , you just need to register your cartographer in the mapreduce.yaml file and define a function for example:

 from mapreduce import operation as op def process(entity): yield op.db.Delete(entity) 

In Java, you should take a look at this article , which offers such a function:

 @Override public void map(Key key, Entity value, Context context) { log.info("Adding key to deletion pool: " + key); DatastoreMutationPool mutationPool = this.getAppEngineContext(context) .getMutationPool(); mutationPool.delete(value.getKey()); } 
+5
Sep 08 2018-10-10T00:
source share

One tip. I suggest you find out remote_api for these uses (bulk delete, change, etc.). But even with a remote api, the lot size can be limited to a few hundred at a time.

+4
Sep 09 '09 at 15:47
source share

Unfortunately, there is no way to easily remove bulk deletion. Itโ€™s best to write a script that deletes a reasonable number of records for each call and then calls it again - for example, if your delete script returns a 302 redirect whenever more data is deleted and then retrieved with "wget โ€‹โ€‹-max-redirect = 10000" (or some other large number).

+3
Sep 20 '08 at 19:03
source share

With django, setup url:

 url(r'^Model/bdelete/$', v.bulk_delete_models, {'model':'ModelKind'}), 

Type of installation

 def bulk_delete_models(request, model): import time limit = request.GET['limit'] or 200 start = time.clock() set = db.GqlQuery("SELECT __key__ FROM %s" % model).fetch(int(limit)) count = len(set) db.delete(set) return HttpResponse("Deleted %s %s in %s" % (count,model,(time.clock() - start))) 

Then run in powershell:

 $client = new-object System.Net.WebClient $client.DownloadString("http://your-app.com/Model/bdelete/?limit=400") 
+1
Mar 24 '10 at 21:12
source share

If you use Java / JPA, you can do something like this:

  em = EntityManagerFactoryUtils.getTransactionalEntityManager(entityManagerFactory) Query q = em.createQuery("delete from Table t"); int number = q.executeUpdate(); 

Java / JDO information can be found here: http://code.google.com/appengine/docs/java/datastore/queriesandindexes.html#Delete_By_Query

+1
Jan 05 '11 at 3:12
source share

Yes, you can: Go to Datastore Admin, and then select the Entitiy type that you want to remove and click "Delete." Mapreduce will take care of the removal!

+1
Dec 09 '11 at 11:42 on
source share

On the dev server , you can connect to its application directory and then run it as follows:

 dev_appserver.py --clear_datastore=yes . 

This will launch the application and clear the data store. If you already have another instance, the application will not be able to bind to the required IP address and therefore does not start ... and clear the data store.

+1
Nov 27 '15 at 20:11
source share

You can use task queues to remove fragments from 100 objects. Deleting objects in the GAE shows how limited the administrator is in the GAE. You must work with batches of 1000 entities or less. You can use the bulkloader tool, which works with csv, but the documentation does not extend to java. I use GAE Java, and my deletion strategy involves having 2 servlets, one for the actual deletion and the other for loading the task queues. When I want to do the deletion, I run the queue load servlet, it loads the queues, and then GAE goes on to complete all the tasks in the queue.

How to do it: Create a servlet that deletes a small number of objects. Add the servlet to the task queue. Go home or work on something else;) Check the repository so often ...

I have a data warehouse with about 5,000 objects that I clean up every week, and it takes about 6 hours to clean up, so I run the task on Friday night. I use the same technique to bulk upload my data, which is about 5,000 objects, with about a dozen properties.

0
Feb 11 '10 at
source share

This worked for me:

 class ClearHandler(webapp.RequestHandler): def get(self): self.response.headers['Content-Type'] = 'text/plain' q = db.GqlQuery("SELECT * FROM SomeModel") self.response.out.write("deleting...") db.delete(q) 
0
Jul 31 2018-10-10T00:
source share

Thanks to all the guys, I got what I need .: D
This can be useful if you have many db models to delete, you can send them to your terminal. And also you can manage the delete list in DB_MODEL_LIST yourself.
Delete DB_1:

 python bulkdel.py 10 DB_1 

Delete all databases:

 python bulkdel.py 11 

Here is the bulkdel.py file:

 import sys, os URL = 'http://localhost:8080' DB_MODEL_LIST = ['DB_1', 'DB_2', 'DB_3'] # Delete Model if sys.argv[1] == '10' : command = 'curl %s/clear_db?model=%s' % ( URL, sys.argv[2] ) os.system( command ) # Delete All DB Models if sys.argv[1] == '11' : for model in DB_MODEL_LIST : command = 'curl %s/clear_db?model=%s' % ( URL, model ) os.system( command ) 

And here is a modified version of alexandre fiori code.

 from google.appengine.ext import db class DBDelete( webapp.RequestHandler ): def get( self ): self.response.headers['Content-Type'] = 'text/plain' db_model = self.request.get('model') sql = 'SELECT __key__ FROM %s' % db_model try: while True: q = db.GqlQuery( sql ) assert q.count() db.delete( q.fetch(200) ) time.sleep(0.5) except Exception, e: self.response.out.write( repr(e)+'\n' ) pass 

And, of course, you must map the link to the model in the file (e.g. main.py in GAE) ;;)
In case some guys like me need this in detail, here is the main.py part:

 from google.appengine.ext import webapp import utility # DBDelete was defined in utility.py application = webapp.WSGIApplication([('/clear_db',utility.DBDelete ),('/',views.MainPage )],debug = True) 
0
Sep 03 '11 at 7:27
source share

To remove all entities of this kind in the Google App Engine, you just need to do the following:

 from google.cloud import datastore query = datastore.Client().query(kind = <KIND>) results = query.fetch() for result in results: datastore.Client().delete(result.key) 
0
Feb 06 '19 at 12:15
source share

In javascript, the following elements will be deleted on the page:

 document.getElementById("allkeys").checked=true; checkAllEntities(); document.getElementById("delete_button").setAttribute("onclick",""); document.getElementById("delete_button").click(); 

given that you are on the admin page (... / _ ah / admin) with the objects you want to delete.

-2
Nov 29
source share



All Articles