Overwriting a file without risk of file corruption

So often, my applications want to save files for download later. Having recently received a failure with a crash, I want to write the operation in such a way that I am guaranteed either new data or original data, but not a damaged mess.

My first idea was to do something according to (save a file called example.dat):

  • Create a unique file name for the target directory, for example. example.dat.tmp
  • Create this file and write me your details.
  • Delete source file (example.dat)
  • Rename ("Move") the temporary file to where the original was (example.dat.tmp โ†’ example.dat).

Then, at boot time, the application can follow these rules:

  • If there is no "example.dat" and no "example.dat.tmp", first run / a new project, so load by default / create a new file.
  • If "example.dat" and not "example.dat.tmp", download example.dat (normal download mode)
  • If "example.dat.tmp" exists, offer the user the ability to potentially recover data. If "example.dat" also exists, do not overwrite it without an explicit user constant.

However, after doing a little research, I found that just like OS caching, which I can override with file cleaning methods, some disk drives still cache internally and can even lie to the OS saying they are done, so 4. may end, the record is not actually written, and if the system goes down, I lost my data ...

I'm not sure if the disk problem is actually solvable by the application, but are the general rules above the right thing? Should I keep the old copy to restore the file longer, to be sure which rules apply to such things (for example, acceptable disk usage if the user chooses where to put such files, etc.).

Potential conflicts with the user and other programs for "example.dat.tmp" should also be avoided. I remember that sometimes the example โ€œexample.datโ€ was found from some other software, is this the best agreement?

+4
source share
2 answers

If the disks are connected to the OS, the data is physically on the disk, and this is not so, then there is not much you can do about it. Many disks cache a certain amount of writes and reports about it, but such disks should have a backup of the battery and the completion of physical recording no matter what (and they will not lose data in the event of a system failure, because they will not even see it).

For the rest, you say that you did some research, so you certainly know that you cannot use std::ofstream (and FILE* ) for this; You must make actual entries at the system level and open files with special attributes in order to fully synchronize them. Otherwise, operations may buffer the OS for a while. And that, as far as I know, there is no way to provide such synchronization for a rename . (But I'm not sure if this is necessary if you always keep two versions: my usual convention in such cases is to write the file "example.dat.new" , then when I have finished writing, delete any file with the name "example.dat.bak" , rename "example.dat" to "example.dat.bak" and then rename "example.dat.new" to "example.dat" . "example.dat" Given this, you should be able to figure out what whether or not it happened, and find the correct file (interactively, if necessary, or insert the start line with a timestamp).

+2
source

You should lock the actual data file when you write its replacement, if there is a possibility that another process may go through the same protocol that you describe.

You can use flock to lock the file.

As for your temp file name, you can make it a process identifier, for example "example.dat.3124", no other simultaneous process will generate the same name.

0
source

Source: https://habr.com/ru/post/1493693/


All Articles