Is Python 'sys.argv' limited in the maximum number of arguments?

I have a Python script that needs to process a large number of files. To get around Linux with a relatively small limit on the number of arguments that can be passed to the command, I use find -print0 with find -print0 xargs -0 .

I know that another option would be to use the Python glob module, but that will not help when I have a more advanced find looking for modification times, etc.

When running my script in a large number of files, Python only accepts a subset of the arguments, the limitation I thought at first was in argparse but seems to be in sys.argv . I can not find documentation on this. This is mistake?

Here's an example Python script illustrating the point:

 import argparse import sys import os parser = argparse.ArgumentParser() parser.add_argument('input_files', nargs='+') args = parser.parse_args(sys.argv[1:]) print 'pid:', os.getpid(), 'argv files', len(sys.argv[1:]), 'argparse files:', len(args.input_files) 

I have many files to run this:

 $ find ~/ -name "*" -print0 | xargs -0 ls > filelist 748709 filelist 

But it looks like xargs , or Python splits my large list of files and processes it with a few Python starts:

 $ find ~/ -name "*" -print0 | xargs -0 python test.py pid: 4216 argv files 1819 number of files: 1819 pid: 4217 argv files 1845 number of files: 1845 pid: 4218 argv files 1845 number of files: 1845 pid: 4219 argv files 1845 number of files: 1845 pid: 4220 argv files 1845 number of files: 1845 pid: 4221 argv files 1845 number of files: 1845 ... 

Why are multiple processes created to process the list? Why is he being deceived at all? I don’t think there are newlines in the file names and shouldn't -print0 and -0 take care of this problem? If new lines appeared, I would expect sed -n '1810,1830p' filelist show some weirdness for the above example. What gives?

I almost forgot:

 $ python -V Python 2.7.2+ 
+3
source share
4 answers

xargs will block your default arguments. Check out the --max-args and --max-chars . His man page also explains the limits (under --max-chars ).

+7
source

Everything you want from find is available in os.walk .

Do not use find and shell for any of them.

Use os.walk and write all your rules and filters in Python.

"search for modification time" means that you will use os.stat or a similar library function.

+3
source

Python does not seem to limit the number of arguments other than the operating system.

Have a look here for a more complete discussion.

+2
source

xargs will go as far as possible, but still there is a limit. For instance,

 find ~/ -name "*" -print0 | xargs -0 wc -l | grep total 

will give you some lines of output.

You probably want your script to take a file containing a list of file names, or accept the file names on your stdin.

+1
source

Source: https://habr.com/ru/post/909397/


All Articles