Retrieving files from a Directory argument, sorting by size

I am trying to write a program that takes a command line argument, looks at the directory tree provided by the argument, and creates a list of all the files in the directory, and then sorts by the length of the files.

I don't really like script -guy - but this is what I have and it doesn't work:

import sys import os from os.path import getsize file_list = [] #Get dirpath dirpath = os.path.abspath(sys.argv[0]) if os.path.isdir(dirpath): #Get all entries in the directory for root, dirs, files in os.walk(dirpath): for name in files: file_list.append(name) file_list = sorted(file_list, key=getsize) for item in file_list: sys.stdout.write(str(file) + '\n') else: print "not found" 

Can someone point me in the right direction?

+6
source share
3 answers

Hope this function helps you (I am using Python 2.7):

 import os def get_files_by_file_size(dirname, reverse=False): """ Return list of file paths in directory sorted by file size """ # Get list of files filepaths = [] for basename in os.listdir(dirname): filename = os.path.join(dirname, basename) if os.path.isfile(filename): filepaths.append(filename) # Re-populate list with filename, size tuples for i in xrange(len(filepaths)): filepaths[i] = (filepaths[i], os.path.getsize(filepaths[i])) # Sort list by file size # If reverse=True sort from largest to smallest # If reverse=False sort from smallest to largest filepaths.sort(key=lambda filename: filename[1], reverse=reverse) # Re-populate list with just filenames for i in xrange(len(filepaths)): filepaths[i] = filepaths[i][0] return filepaths 
+6
source

This is an approach using generators. Must be faster for a large number of files ...

This is the beginning of both examples:

 import os, operator, sys dirpath = os.path.abspath(sys.argv[0]) # make a generator for all file paths within dirpath all_files = ( os.path.join(basedir, filename) for basedir, dirs, files in os.walk(dirpath) for filename in files ) 

If you just need a list of files without size, you can use this:

 sorted_files = sorted(all_files, key = os.path.getsize) 

But if you need the files and paths in the list, you can use this:

 # make a generator for tuples of file path and size: ('/Path/to/the.file', 1024) files_and_sizes = ( (path, os.path.getsize(path)) for path in all_files ) sorted_files_with_size = sorted( files_and_sizes, key = operator.itemgetter(1) ) 
+5
source

You are retrieving a command, not the first argument with argv[0] ; use argv[1] for this:

 dirpath = sys.argv[1] # argv[0] contains the command itself. 

For performance reasons, I suggest you pre-select file sizes, rather than asking the OS about the size of the same file several times during sorting (as Koffein suggests, os.walk is the way to go):

 files_list = [] for path, dirs, files in os.walk(dirpath)): files_list.extend([(os.path.join(path, file), getsize(os.path.join(path, file))) for file in files]) 

Assuming you don’t need an unsorted list, we will use the in-place sort () method:

 files_list.sort(key=operator.itemgetter(1)) 
0
source

Source: https://habr.com/ru/post/958959/


All Articles