Pickup directories: how not to write files that are still being written?

I have a Python script that checks the pickup directory and processes all found files and then deletes them.

How can I make sure I donโ€™t take a file that is still being written by a process that drops files in this directory?

My test case is pretty simple. I copy 300 MB of files to the pickup directory, and often the script will grab the file that is still being written. It only works with a partial file and then deletes it. This leads to an error when working with a file in the OS, since the file that it was writing disappeared.

  • I tried to acquire a file lock (using the FileLock module) before opening / processing / deleting it. But it did not help.

  • I reviewed checking file modification time to avoid anything for X seconds. But that seems awkward.

My test is on OSX, but I'm trying to find a solution that will work on major platforms.

I see a similar question here ( How to check if a file is saved? ), But there was no clear solution.

thanks

+6
source share
5 answers

As a workaround, you can listen to the modified files ( watchdog is cross-platform). A modified event (at least on OS X) does not fire for each record, it only fires when it is closed. Therefore, when you detect a modified event, you can assume that all entries are complete.

Of course, if a file is written in chunks and saved after each fragment, this will not work.

+2
source

One solution to this problem would be to change the program that writes the files to first write the files to a temporary file, and then move this temporary file to the destination when it is done. On most operating systems, when the source and destination are on the same file system, the move is atomic.

+1
source

If you do not have control over part of the record, all you can do is look at the file yourself, and when it stops growing for a certain time, call it good. I have to use this method myself, and found that 40 seconds is safe for my conditions.

+1
source

Each OS will have a different solution, since file locking mechanisms are not portable.

  • On Windows, you can use OS lock.
  • On Linux, you can look into open files (similar to how lsof works), and if the file is open, leave it.
+1
source

Did you try to open the file before dealing with it? If the file is still in use, open () should throw an exception.

try: with open(filename, "rb") as fp: pass # Copy the file except IOError: # Dont copy 
0
source

Source: https://habr.com/ru/post/898250/


All Articles