Inconsistent fetch from Google App Engine datastore

I have an application deployed to a Google application. I get inconsistent data when I get an object by ID immediately after updating this object. I am using JDO 3.0 to access the application data store.

I have an Employee employee

@PersistenceCapable(detachable = "true") public class Employee implements Serializable { /** * */ private static final long serialVersionUID = -8319851654750418424L; @PrimaryKey @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY, defaultFetchGroup = "true") @Extension(vendorName = "datanucleus", key = "gae.encoded-pk", value = "true") private String id; @Persistent(defaultFetchGroup = "true") private String name; @Persistent(defaultFetchGroup = "true") private String designation; @Persistent(defaultFetchGroup = "true") private Date dateOfJoin; @Persistent(defaultFetchGroup = "true") private String email; @Persistent(defaultFetchGroup = "true") private Integer age; @Persistent(defaultFetchGroup = "true") private Double salary; @Persistent(defaultFetchGroup = "true") private HashMap<String, String> experience; @Persistent(defaultFetchGroup = "true") private List<Address> address; /** * Setters and getters, toString() * */ } 

Initially, when I create an employee, I do not set salary fields and email.

I am updating the Employee object to add salary and email later. The update works fine, and the data is saved in the application store. However, when I immediately try to get the same employee object by id, I sometimes get outdated data, where salary and email are zero. The code I use to create and retrieve the employee object is shown below.

  public Employee create(Employee object) { Employee persObj = null; PersistenceManager pm = PMF.get().getPersistenceManager(); Transaction tx = null; try { tx = pm.currentTransaction(); tx.begin(); persObj = pm.makePersistent(object); tx.commit(); } finally { if ((tx != null) && tx.isActive()) { tx.rollback(); } pm.close(); } return persObj; } public Employee findById(Serializable id) { PersistenceManager pm = PMF.get().getPersistenceManager(); try { Employee e = pm.getObjectById(Employee.class, id); System.out.println("INSIDE EMPLOYEE DAO : " + e.toString()); return e; } finally { pm.close(); } } public void update(Employee object) { PersistenceManager pm = PMF.get().getPersistenceManager(); Transaction tx = null; try { tx = pm.currentTransaction(); tx.begin(); Employee e = pm.getObjectById(object.getClass(), object.getId()); e.setName(object.getName()); e.setDesignation(object.getDesignation()); e.setDateOfJoin(object.getDateOfJoin()); e.setEmail(object.getEmail()); e.setAge(object.getAge()); e.setSalary(object.getSalary()); tx.commit(); } finally { if (tx != null && tx.isActive()) { tx.rollback(); } pm.close(); } } 

I set the number of unoccupied instances to 5, and about 8 instances will be launched at the same time. When I checked the logs of different instances, this is what I found. enter image description here

Why do I get stale data when a request is served by specific instances. I can assure that if the selection request is processed by the instance that originally processed the update request, I always get updated data. But when other instances process the fetch request, stale data may be returned. I have clearly established the consistency of reading the data store in my jdoconfig.xml.

 <?xml version="1.0" encoding="utf-8"?> <jdoconfig xmlns="http://java.sun.com/xml/ns/jdo/jdoconfig_3_0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/jdo/jdoconfig http://java.sun.com/xml/ns/jdo/jdoconfig_3_0.xsd"> <persistence-manager-factory name="transactions-optional"> <property name="javax.jdo.PersistenceManagerFactoryClass" value="org.datanucleus.api.jdo.JDOPersistenceManagerFactory"/> <property name="javax.jdo.option.ConnectionURL" value="appengine"/> <property name="javax.jdo.option.NontransactionalRead" value="true"/> <property name="javax.jdo.option.NontransactionalWrite" value="true"/> <property name="javax.jdo.option.RetainValues" value="true"/> <property name="datanucleus.appengine.autoCreateDatastoreTxns" value="true"/> <property name="datanucleus.appengine.singletonPMFForName" value="true"/> <property name="datanucleus.appengine.datastoreEnableXGTransactions" value="true"/> <property name="datanucleus.query.jdoql.allowAll" value="true"/> <property name="datanucleus.appengine.datastoreReadConsistency" value="STRONG" /> </persistence-manager-factory> </jdoconfig> 
+6
source share
3 answers

I have a suggestion, but you won’t like it: use the low-level API exclusively and forget about JDO / JPA when using GAE.

As with @asp, get by ID must be strictly consistent, however the GAE JDO plugin seems to be listening to me. Unfortunately, switching to JPA also did not help in my case (more details here: JDO transactions + lots of GAE instances = data redefinition ). Also, if I annotate any class like @PersistenceAware, Eclipse goes crazy and improves classes in an infinite loop. Also, I had a lot of problems when using the @PersistenceCapable class with the built-in class and caching (without caching, it worked fine).

Well, the thing is, I think it will be faster with a low-level API - you know exactly what is happening, and it seems to work as intended. You can think of Entity as a map, with a bit of self-employed wrapping code, this seems like a pretty interesting alternative. I ran some tests with a low-level API, I did not pass them any problems, and transferring it using JDO / JPA is not possible. I am in the middle of porting my entire application from JDO to a low-level API. It takes a lot of time, but less than waiting endlessly for some sort of magical solution or fix from the GAE team.

Also, when I wrote GAE JDO, I felt ... alone. If you have a problem with java or even with android, thousands of other people already had this problem, they asked about this in stackoverflow and got a lot of valid solutions. Here you are all yourself, so use it as low-level API as possible, and you will be sure what is happening. Despite the fact that migration seems awful, like hell and takes a lot of time, I think you will spend less time moving to a low-level API than with GAE JDO / JPA. I am not writing it to pinch the team that is developing the GAE JDO / JPA or insult them, I am sure that they are doing their best. But:

  • There are not many people using GAE compared to, say, Android or Java in general.

  • Using GAE JDO / JPA with multiple server instances is not as simple and easy as you think. A developer like me wants his work to be done as soon as possible, see Example, read some documentation - do not study all this in detail, read the quick guide and the developer has a problem, he would like to share it with stackoverflow and get fast help. Its easy to get help if you are doing something wrong on Android, regardless of its complexity or simplicity. It is not so easy with GAE JDO / JPA. I spent much more time on GAE JDO articles, tutorials, and documentation than I would like, and I could not do what I wanted, although it seemed pretty simple. If I just used a low-level API and didn't try to use a shortcut with JDO (yes, I thought JDO would save my time), it will be much faster.

  • Google is focused on Python GAE much more than Java. Many articles targeted at all audiences only have Python code and tips, examples given here: http://googlecloudplatform.blogspot.com/2013/12/best-practices-for-app-engine-memcache.html or here: https://cloud.google.com/developers/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/ . I noticed that even before the start of development, but I wanted to share some code with my Android client, so I chose Java. Despite the fact that I have a solid Java background, and even if I have common code, if I can go back in time and select again, I would choose Python now.

That is why I believe that it is best to use only the most basic methods of accessing and managing data.

Good luck, I wish you all the best.

+2
source

If you use a highly replicated data warehouse, setting a read policy does not guarantee that all reads are strictly consistent, they only work for ancestor requests. From the documentation;

The API also allows you to explicitly set a strong consistency policy, but this option will not have a practical effect, since requests that are not related to the ancestor will always ultimately match regardless of policy.

https://cloud.google.com/appengine/docs/java/datastore/queries#Java_Data_consistency https://cloud.google.com/appengine/docs/java/datastore/jdo/overview-dn2#Setting_the_Datastore_Read_Policy_and_Call_eadline

Please see the Data Structuring for Strong Consistency , Preferred Cache Layer Approach for Data Serving document.

I noticed that you are using get by ID, but not sure, but "get by key" should be strictly consistent even for a personnel data store ( link ), can you try changing this to a key-based request? The key is built using the identifier and entity of the species and the pedigree.

+4
source

Add @Cacheable(value = "false") to the entity class. This problem will be resolved.

The above problem is mainly due to JDO cache. So if we disable the cache, JDO will read data from the data warehouse

0
source

Source: https://habr.com/ru/post/976764/


All Articles