Strange data ending in a saved user XML file

My application uses XML to save user data to a file. I recently received 2 reports from users who see completely unexpected data in their file. Instead of XML, it looks like this:

({"windows":[{"tabs":[{"entries":[{"url":"https://mail.google.com/a/cast... 

And a little more from the middle of the file, which weighs almost 30 KB:

 {\"n\":\"bc\",\"a\":[null]},{\"n\":\"p\",\"a\":[\"ghead\",\"\",0]},{\"n\":\"ph\",\"a\":[{\"gb_1\":\"http://www.google.com/ 

Can someone tell me what data it is or how it ends in my users data file? Both users reported having to hold the power button to turn off their machines. Shutting down in one case was freezing Firefox, and in another case it was a mouse problem. One user actually experienced a kernel panic.

I still do not think that this is a memory management problem, since my user base has more than 100,000 people, and I received only 2 reports. I think this is something narrower / rarer.

This is part of the code that I use to write data to a file:

 NSString *xmlString = [[self convertContextToXmlString:context] retain]; NSError *e = nil; [[xmlString dataUsingEncoding:NSUTF8StringEncoding] writeToFile:location options:NSDataWritingAtomic error:&e]; [xmlString release]; if (e) { NSLog(@"An error occurred saving: %@", [e description]); } return e; 

Saving data never happens in the background thread, always in the user interface thread. I also use the NSDataWritingAtomic parameter to write data to a file.

Edit: The second user file has nearly identical data. Thus, both erroneous contents come from the same place, but where? I will add 200 percent generosity to this question as soon as I can.

 AV/////wEAAAAAAAAAAAABAAA="}]}]},{"url":"http://googleads.g.doubleclick.net/pagead/ads?client=ca-pu 

Edit 2: Got a third report from a user who also experienced data corruption by pressing the power button to turn it off. His data had a lot of random junk at the beginning, and then the corresponding data at the end:

 (garbage)rred="1"><rest of it was normal xml...> 
+4
source share
3 answers

Got a great response from one of the Apple developers. My existing model will be migrated to Core Data over the next few weeks. (StackOverflow got confused with some lists / formatting, but it is for the most part still very readable.)

I will begin my answer with a word on the publication of HFS Plus. Since being published on Mac OS X 10.2.x, the guarantee of the correctness of the Mac OS X file system has been that - regardless of kernel panic, power failures, etc. - file system actions will lead to one of two results:

o either the operation will be turned over by the log, in which case the operation will be completed successfully

o or the operation will be discarded, in which case it will be as if the operation was never performed

This warranty has two critical limitations:

o It applies to individual file system operations (creates, deletes, moves, etc.), and not to groups of operations.

o It applies only to the structure of the logical file system, and not to the data inside the files.

In short, the purpose of a journal is to prevent general corruption of the file system, rather than damage to a specific file.


With this in mind, let's look at the behavior - [NSData writeToFile: options: error:]. His behavior can be very complex, but in a typical case it is quite simple. One way to learn this is to write code and see its behavior in the file system. For example, here are some test codes:

 - (IBAction)testAction:(id)sender { BOOL success; NSData * d; struct stat sb; d = [@"Hello Cruel World!" dataUsingEncoding:NSUTF8StringEncoding]; assert(d != nil); (void) stat("/foo", &sb); success = [d writeToFile:@"/tmp/WriteTest.txt" options:NSDataWritingAtomic error:NULL]; (void) stat("/foo", &sb); assert(success); } 

Two calls are just markers; they make it easy to see which file system operations are generated using -writeToFile: options: error :.

You can see the behavior of the file system using:

$ sudo fs_usage -f filesys -w WriteTest

where "WriteTest" is the name of my test program.

Here is an excerpt from the result of fs_usage:

14: 33: 10.317 stat [2] / foo 14: 33: 10.317 lstat64 private / tmp / WriteTest.txt 14: 33: 10.317 open F = 5 (RWC__E) private /tmp/.dat2f56.000 14: 33: 10.317 record F = 5 B = 0x12 14: 33: 10.317 fsync F = 5 14: 33: 10.317 close F = 5 14: 33: 10.318 rename private / tmp / .dat2f56.000 14: 33: 10.318 chmod private / tmp / WriteTest. txt 14: 33: 10.318 stat [2] / foo

You can clearly see the "stat" calls that surround -writeToFile: options: error: call, which means that all things between these calls are generated using -writeToFile: options: error :.

What is he doing? Well, this is actually quite simple:

  • It creates, writes to, fsyncs and closes the temporary file containing the data.

  • It renames the temporary file on top of the file you are writing.

  • Resets the permissions of the destination file.


All-in-one is a pretty standard UNIX-style safe save. But the question is, how does this affect data integrity? The main thing to note is that fsync does not guarantee that all data has been returned to disk before returning. This problem has a long and complex history, but the summary is that fsync is called too many times, too many performance-sensitive locations, in order to make this guarantee. This means that all file corruption issues that you see are possible, as described below:

o "iProcrastinate_Bad_2.ipr" and "iProcrastinate_Bad_3.ipr" simply contain incorrect data. This can happen as follows:

  • The application creates a temporary file.

  • The application is written to a temporary file. In response to this kernel:

a. selects a set of blocks on disk b. adds them to the file with. extends file length; copies data written to buffer cache

  • The fsyncs application will close the file. The kernel responds by scheduling data blocks that should be written as soon as possible.

  • The application renames the temporary file on top of the real file.

  • Panic system core.

  • When the system reboots, the changes from steps 1, 2a..2c, 3, and 4 are restored from the log, which means that you have a valid file containing invalid data.

o "iProcrastinate_Bad_1.ipr" is just a small change above. If you open the file with a hex editor, you will find that it looks good, except for the data range with an offset of 0x6000..0x61ff, which seems to contain data that is completely unrelated to your application. It is noteworthy that the length of this data, 0x200 bytes, is exactly one block of the disk. Thus, it seems that the kernel was able to write all user data to disk, with the exception of this one block.


So where does this leave you? It is unlikely that [NSData writeToFile: options: error:] will ever become more reliable than it is; as I mentioned earlier, such changes tend to negatively affect overall system performance. This means that your application will have to take care of this problem.

In this regard, there are three common ways to strengthen your application:

a. F_FULLFSYNC. You can transfer the file to persistent storage by calling it using the F_FULLFSYNC selector. You can use this in your application by replacing - [NSData writeToFile: options: error:] with your code, which is called F_FULLFSYNC instead of fsync.

The most obvious drawback of this approach is that F_FULLFSYNC is very slow.

B. journalalling - Another option for adopting a more robust file format that may support journaling. A good example of this is SQLite, which can be used directly or through Core Data.

C. Safe preservation. Finally, you can implement a more secure save mechanism with a backup file. Before calling - [NSData writeToFile: options: error:], to write a file, you can rename the previous file to another name and leave this file just in case. If after opening the main file you find that it is damaged, you will automatically return to the backup.

Of these approaches, my preference is for B, and especially for B with core data, because Core Data offers many benefits beyond data integrity. However, for a quick fix, option C is probably the best choice.

Let me know if you have any questions about this.

+3
source

Logically, I see only two reasons for this:

  • [self convertContextToXmlString:context] returned this string. In this case, we cannot debug further without having an idea of ​​how this method works. You could put some kind of statement to make sure the return value looks like XML

  • Some other processes / application / code are written to the same place. Your application does not work with JSON, as you say, so it seems to exclude the possibility that it is you. What is this location?

+1
source

This is similar to JSON.

Where does the data come from? If this web service is not under your control, has the provider changed the default response format and you are not requesting XML explicitly?

0
source

Source: https://habr.com/ru/post/1342692/


All Articles