I have several Python scripts, each of which heavily uses sorting, unification, counting, gzipping and gunzipping and firmware. As the first run of code that I used subprocess.callwith (yes, I know about security risks, why did I say that this is the first pass) shell=True. I have a little helper function:
def do(command):
start = datetime.now()
return_code = call(command, shell=True)
print 'Completed in', str(datetime.now() - start), 'ms, return code =', return_code
if return_code != 0:
print 'Failure: aborting with return code %d' % return_code
sys.exit(return_code)
Scripts use this helper, as in the following snippets:
do('gunzip -c %s | %s | sort -u | %s > %s' % (input, parse, flatten, output))
do("gunzip -c %s | grep 'en$' | cut -f1,2,4 -d\|| %s > %s" % (input, parse, output))
do('cat %s | %s | gzip -c > %s' % (input, dedupe, output))
do("awk -F ' ' '{print $%d,$%d}' %s | sort -u | %s | gzip -c > %s" % params)
do('gunzip -c %s | %s | gzip -c > %s' % (input, parse, output))
do('gunzip -c %s | %s > %s' % (input, parse, collection))
do('%s < %s >> %s' % (parse, supplement, collection))
do('cat %s %s | sort -k 2 | %s | gzip -c > %s' % (source,other_source,match,output)
And there are many more, some with even longer conveyors.
One of the problems that I notice is that when a command at an early stage of the pipeline fails, the whole command will still succeed with exit status 0. In bash I will fix this with
set -o pipefail
, Python. , bash, . ?
Python, shell=True. Popen stdout=PIPE, . - , - " " Python, !
: ; , shell=True, , . , ! shell=True Python, , ?