tl; dr Before embarking on any decision, you should think about how you expect the update to behave. Any update in general can be either a complete replacement or a partial replacement (first first, then replacing these parts). Both full and partial replacements have their pros and cons. The technical implementation of both types of solutions can vary greatly.
If I understood correctly, you have the original dataset that comes with your application. You want to change the original dataset in the new version of your application. Data may or may not have been changed by the user.
As I see it, a good solution to this problem depends on your application and how the default data is currently stored and used in your application.
User objects and initial objects (possibly modified) together
If the application was created with the default dataset installed, and later the user can change them, delete them, add their own and cannot determine whether the record is part of the original dataset or is the user modified record, then updating the default dataset is much more difficult. but also more interesting.
Since you ask, “What if the data is modified?”, I assume that you will also find this case more interesting.
What is your expected behavior?
Personally, I would try to determine the exact behavior that you expect before getting into the technical solution. Some cases are very simple, for example:
"if the entry in the default dataset is to be deleted and the user has already deleted it and then did nothing"
"if you want to add a new record to the default dataset, then add it
"if the record is not part of the default dataset, then do nothing."
However, there are many subtle variations, such as
- "if the entry in the default date set should be deleted and the user changed it, then ...".
In this case, you should probably consider the data as part of the user data, and not change it, but perhaps you have every reason to update it. When you start writing them, you will see very clearly that there are many cases that you may not have had, although before. In addition, by recording these cases, you document your decisions so that you can come back and look at them later.
What do you plan for the future?
Once you have decided in detail on the goals of updating the data, you can think about how to implement a solution for these purposes. This is also a good time to start thinking about the future. If you feel the need to update your original data set, then most likely you will most likely want to update them again in the future. Maybe now is the right time to think about how to make such updates easier in the future. Perhaps now is the right time to update the circuit. But maybe not. Some solutions to the update problem do not require updating the schema.
Designing Future Data Updates
If, by chance, you had the feeling "if only XYZ", thinking about how to update this data. Then you probably have a place to develop your future update mechanism. Without knowing more about the complexity of your data, the size of it or the approximate ratio of investments, deletions and immutability is included in the update, it is very difficult to give specific advice on how to develop a good solution for the update. However, I will try to point out things to consider.
Switching to a very high level of abstraction, there are two main ways to update a dataset: replace everything or calculate the difference and replace only what has changed
1. Replace everything
Design
If the amount of input data is very small, that is, small enough not to require an improved update mechanism, you can simply update the entire default data set with each update. To be able to replace default data without changing user data, you must either separate the default data from user data (or have a solution where it is already separate), or at least determine whether the record is part of the original data set or no.
Splitting default data from user data
To be able to simply “replace all old default data with new default data”, all old data must be identified and deleted. There are several ways to do this. If it is possible to heuristically identify whether the record is part of the default dataset or not, perhaps through the timestamp when it was created, or something like that, then no major changes are required. All of these entries can be identified as default data. If not, the first update will be harder.
As stated above, you should design for future updates. Therefore, if you cannot determine which data is part of the default data set and what the user data is, you should probably change your model so that you can somehow separate them. A simple Boolean value is a very minor modification.
It is worth noting that large deleted Core Data can be very slow, since Core Data does a lot of work behind the scenes after the relationship and takes actions in accordance with the delete rules for each connection. If you delete the entire data set, most likely it will be faster to separate the default data in its own storage, i.e. Native SQLite file on disk. You can then delete the entire SQLite file, since all objects in it will be deleted. This, however, will increase the complexity of the decision, so measure the removal time before making any decisions on effectiveness.
What about modified records?
As mentioned above, there are several different things that can be done with a changed object after updating, and depending on whether the changed objects should be treated as user objects or not, these objects must be changed so that they appear as user objects into the update mechanism (i.e., that removes all objects by default).
(Note: if the default object, which is modified and then changed to its original value, is considered the user entity or the default object? How can we track such changes?)
Update procedure
Depending on whether or not the source data is stored separately, and regardless of whether you want to split it, you may need to transfer it for the first time. Migration may also be required if it is not possible to determine which data is part of the default. After migration, if necessary, the source data can be separately updated without transfer for future data updates.
Depending on the exact solution, one could do updates in the background using the parent / child context. This is described further in decision 2 ("diff").
Pros:
- Less code to write
- Can handle any data complexity, replacing it all
Minuses:
- Migration may be required to separate default data from user data.
- May cause entries in the default dataset that have been deleted or changed to return when updating data.
- Perhaps a more complex data model
- Will have poor performance on large datasets
2. Replace only the difference
Design
Depending on how complex your data is and the ratio of the changed or unchanged records in the update, one project that is suitable for a small number of changed records may be to keep all updates separate. However, this requires that all updates be described in this way. If you know the difference between the old default dataset and the new default dataset, all updates can be described as deleting, inserting, or modifying.
(This is similar to how the version control system works: instead of (in the case of versions) copying the entire file, only modifications are added ("diff"). In case of updating, you do not store outdated data, you replace it. The advantages are similar, but the update time becomes proportional to the size of the update, not the total size of the data.)
Inserts
The inserts are probably the simplest. Keeping all new entries that need to be inserted separately, they can be overwritten and added to user data.
Delete
Deletions are equally easy if they can be identified unambiguously and until they can be changed in any way. By storing the necessary information to uniquely identify the object and make sure that it has not been modified. These records can be retrieved and deleted from Core Data.
Modifies
Modified records can be very complex depending on the complexity of the changes. Unambiguous modifications are close to trivial, but relational modifications open up to many new questions that should be looked at (for example, above) before moving on.
How to save updates?
You may have noticed that I do not know how these updates will be stored. This is because it also depends on needs and resources. A simple solution would be to include them in the updated application as pre-populated data. However, updates do not have to be stored on the device itself. If the total size of all updates is small enough, they can be placed on the server and downloaded to the device in the background. Storing updates on the server gives a huge bonus, allowing you to enter new data updates without the need to update the application itself.
In any case, downloading updates or not, as soon as the updates are on the device, they must be saved in some way. They can be stored in another Core Data model in your application, in which case you do not have to migrate, because updates are objects in another model. Saving updates in flat files or any other non-nuclear way of transferring data also has this advantage.
The decision on how to store updates is similar to the decision on how to store data of any type in the first place. It should look like the process you used to use Core Data for your master data.
Update procedure
When the user launches your updated application, you will not need to block the user interface in order to make a long migration, since the model itself does not need to be changed. Assuming that the updates somehow got to the device and are stored somewhere, they can be repeated in the background. If you focus only on iOS 5, you can use the parent / child context setting to update background data in the background. A good resource for importing backgrounds into Core Data is Core Data for Mac, iPhone, and iPad Update from iDeveloper.tv . There is, of course, the “What's New in Master Data” WWDC video, which also covers the settings for the parent / child context.
If you go with this solution, you can create a background context and make all the changes in the low priority queue to it. Depending on the amount of updated data, I would save any changes in the "real" context of Core Data in packages, as well as delete entities in already updated update tables. Thus, the whole update process can resume where it was if the update took a very long time and the user left or if the application crashed in the middle.
As a rule, there is no need to compare how you insert or delete a large amount of data that you want to save in batches and somehow indicate what data has already been processed so that the application can resume import / deletion. It should not be saved after each recording. If the application crashed and some records were not saved, it still remains a huge victory if it can resume before the processed records are processed and process them again. By indicating that certain data was processed only after it was saved, this import can know where to resume without any data.
When using lists of data that you need to insert / delete / modify: by deleting objects from these lists after they have been changed, saved in Core Data, the update mechanism can track insertions / updates / deletes that are not but processed.
Once all updates are saved in a “real” context, you will leave only an empty list of updates.
NOTE. . In the parent / child context, you will need to save the "main" context at one point or another, because it is the only one that actually stores data on disk. Others are only stored in memory.
Pros:
- More effective for small updates or large amounts of unchanged data.
- Possibly small data size for transfer / save for update
- (when downloading updates) Default data can be updated without updating the application
Minuses:
- Plenty of code to write
- Migration of the “update” model is required when the “normal” model is migrated to synchronize the two models.
I noticed that this answer went a lot longer than I intended. I understand that I tried to remain very general in my decisions and that this may not be accurate enough for your decision. If you want, you can comment on my answer and add more detailed information about your problem and the limitations that you have. In this way, I can better approach your needs.