Is this the best way to get a unique version of a w / Python file name?

"Diving in" in Python, and I want to make sure that I don't forget anything. I wrote a script that extracts files from several zip files and saves the extracted files together in one directory. To prevent duplicate file names from duplicating, I wrote this little function - and I'm just wondering if there is a better way to do this? Thank!

def unique_filename(file_name): counter = 1 file_name_parts = os.path.splitext(file_name) # returns ('/path/file', '.ext') while os.path.isfile(file_name): file_name = file_name_parts[0] + '_' + str(counter) + file_name_parts[1] counter += 1 return file_name 

I really require the files to be in the same directory, and the numbering of duplicates is certainly acceptable in my case, so I'm not looking for a more reliable method ("I believe that any pointers are welcome), but just to make sure that all this done right.

+14
python filenames
Oct 08 '08 at 15:50
source share
6 answers

One of the problems is that there is a race condition in your previous code, as there is a gap between testing for existence and creating a file. There may be security implications for this (think of someone maliciously inserting a symbolic link into a sensitive file that they cannot overwrite, but your program runs with a higher privilege). Attacks like these are why things like os.tempnam () are deprecated.

To get around this, it is best to try to create the file in such a way as to get an exception if it fails, and if successful, will return an actually open file object. This can be done using the lower level os.open functions by passing the os.O_CREAT and os.O_EXCL flags. After opening, return the actual file (and possibly the name of the file) that you create. For example, here your code has changed to use this approach (returning a tuple (file, file name)):

 def unique_file(file_name): counter = 1 file_name_parts = os.path.splitext(file_name) # returns ('/path/file', '.ext') while 1: try: fd = os.open(file_name, os.O_CREAT | os.O_EXCL | os.O_RDRW) return os.fdopen(fd), file_name except OSError: pass file_name = file_name_parts[0] + '_' + str(counter) + file_name_parts[1] counter += 1 

[Edit] Actually, the best way that will handle the above problems for you is probably to use the tempfile module, although you may lose some naming control. Here is an example of its use (saving a similar interface):

 def unique_file(file_name): dirname, filename = os.path.split(file_name) prefix, suffix = os.path.splitext(filename) fd, filename = tempfile.mkstemp(suffix, prefix+"_", dirname) return os.fdopen(fd), filename >>> f, filename=unique_file('/home/some_dir/foo.txt') >>> print filename /home/some_dir/foo_z8f_2Z.txt 

The only drawback of this approach is that you always get a file name with some random characters in it, since at first do not try to create an unmodified file (/home/some_dir/foo.txt). You can also look at tempfile.TemporaryFile and NamedTemporaryFile, which will do this, and also automatically delete from disk when closing.

+22
Oct 08 '08 at 16:13
source share

Yes, this is a good strategy for readable but unique file names.

One important change . You must replace os.path.isfile with os.path.lexists ! Since it is written right now, if there is a directory called /foo/bar.baz, your program will try to overwrite it with a new file (which will not work) ... because isfile only checks files and not directories. lexists checks directories, symbolic links, etc .... basically, if there is any reason why the file name could not be created.

EDIT: @Brian gave a better answer, safer and more reliable in terms of race conditions.

+6
Oct 08 '08 at 16:02
source share

Two small changes ...

 base_name, ext = os.path.splitext(file_name) 

You get two results with different meanings, give them different names.

 file_name = "%s_%d%s" % (base_name, str(counter), ext) 

It is not faster or much shorter. But, when you want to change the file name template, the template is in one place and it is a little easier to work with.

+2
Oct 08 '08 at 16:00
source share

If you need readable names, this looks like a good solution.
There are routines for returning unique file names, for example. temp, but they produce long random names.

+1
Oct 08 '08 at 15:52
source share

if you don't like readability, uuid.uuid4 () is your friend.

 import uuid def unique_filename(prefix=None, suffix=None): fn = [] if prefix: fn.extend([prefix, '-']) fn.append(str(uuid.uuid4())) if suffix: fn.extend(['.', suffix.lstrip('.')]) return ''.join(fn) 
+1
Oct 09 '08 at 1:07
source share

What about

 def ensure_unique_filename(orig_file_path): from time import time import os if os.path.lexists(orig_file_path): name, ext = os.path.splitext(orig_file_path) orig_file_path = name + str(time()).replace('.', '') + ext return orig_file_path 

time () returns the current time in milliseconds. combined with the original file name, it is quite unique even in complex multi-threaded cases.

0
Mar 27 '09 at 18:47
source share



All Articles