Transfer data across multiple versions of applications

When updating a GAE app, the best way to update a data model?

The application version number allows you to split multiple versions, but these application versions use the same data store (according to How do I change the application after deployment in Google App Engine? ). So what happens when I download a version of an application with a different data model (I think python is here, but the question should also be valid for Java)? I think this should not be a problem if the changes add a field with a null value and some new classes, so the existing model can be extended without harm. But what if a data model change is deeper? Am I really losing existing data if it becomes incompatible with the new data model?

The only option that I see at the moment is putting the data warehouse in read-only maintenance mode, converting the data offline and deploying the whole again.

+6
source share
1 answer

There are several ways to deal with this, and they are not mutually exclusive:

  • Make inherent changes to the data warehouse and work on the problems that it creates. Insert new fields into existing model classes, switch fields from required to optional, add new models, etc. - this will not violate compatibility with any existing objects. But since these objects do not magically change according to the new model (remember that the data warehouse is a database without a schema), you may need outdated code that partially supports the old model. For example, if you added a new field, you need to access it through getattr(entity, "field_name", default_value) , and not entity.field_name so that it does not cause an AttributeError for old objects.
  • Gradually convert objects to a new format. It's pretty simple: if you find an object that is still using the old model, make the appropriate changes. In the above example, you need to add the object back with the addition of a new field:

     if not hasattr(entity, "field_name"): entity.field_name = default_value entity.put() val = entity.field_name # no getattr'ing needed now 

    Ideally, all your objects will be processed in this way, and you can delete the conversion code at some point. In fact, there will always be some leftovers that need to be converted manually - and this will lead us to option number three ...

  • Batch-convert your entities into a new format. The complexity of the logistics behind this largely depends on the number of processed objects, the activity of your site, resources that you can devote to the process, etc. Just note that using a simple MapReduce may not be the best idea - especially if you used the gradual converter described above. This is due to the fact that MapReduce processes all objects of a given type (retrieves them), while only a tiny percentage may be required. Therefore, it may be useful to manually encode the conversion code by explicitly writing down the query for old objects and, for example, using a library such as ndb .
+6
source

Source: https://habr.com/ru/post/898735/


All Articles