The python communication () subprocess gives None when a list of numbers is expected

When I run the following code

from subprocess import call, check_output, Popen, PIPE gr = Popen(["grep", "'^>'", myfile], stdout=PIPE) sd = Popen(["sed", "s/.*len=//"], stdin=gr.stdout) gr.stdout.close() out = sd.communicate()[0] print out 

Where myfile looks like this:

 >name len=345 sometexthere >name2 len=4523 someothertexthere ... ... 

I get

 None 

When the expected output is a list of numbers:

 345 4523 ... ... 

The corresponding command that I run in the terminal is

 grep "^>" myfile | sed "s/.*len=//" > outfile 

So far, I tried to play with escaping and quoting in different ways, for example, to strip slashes in sed or add extra quotes for grep, but the combinatorial possibilities are great there.

I also considered only reading in a file and writing Python equivalents for grep and sed, but the file is very large (I could always read in turn), it will always work on UNIX systems, and I'm still curious where I made the errors.

Could it be that

 sd.communicate()[0] 

returns some kind of object (instead of a list of integers) for which None is a type?

I know that I can capture the output using check_output in simple cases:

 sam = check_output(["samn", "stats", myfile]) 

but not sure how to make it work with more complex situations when the material becomes available.

What are some productive approaches for getting expected results with a subprocess?

+5
source share
4 answers

As suggested, you need stdout=PIPE in the second process and remove the single quotes from "'^>'" :

 gr = Popen(["grep", "^>", myfile], stdout=PIPE) Popen(["sed", "s/.*len=//"], stdin=gr.stdout, stdout=PIPE) ...... 

But this can be done simply using pure python and re :

 import re r = re.compile("^\>.*len=(.*)$") with open("test.txt") as f: for line in f: m = r.search(line) if m: print(m.group(1)) 

What will be output:

 345 4523 

If the lines starting with > always have a number, and the number always ends after len= , you really don't need a regular expression either:

 with open("test.txt") as f: for line in f: if line.startswith(">"): print(line.rsplit("len=", 1)[1]) 
+4
source
  • Do not put single quotes around ^> in the grep string. This is not bash, so all arguments will be passed to the base program literally.
  • You need to redirect sd stdout to PIPE.
+4
source

You need to redirect stdout to your second call to Popen or the output will just go to the parent process stdout and communicate will return None .

 sd = Popen(["sed", "s/.*len=//"], stdin=gr.stdout, stdout=PIPE) 
+2
source

Padraic Cunningham's answer is valid

Applying single quotes on the command line

 use shlex 

.

 import shlex from subprocess import call, check_output, Popen, PIPE gr = Popen(shlex.split("grep '^>' my_file"), stdout=PIPE) sd = Popen(["sed", "s/.*len=//"], stdin=gr.stdout,stdout=PIPE) gr.stdout.close() out = sd.communicate()[0] print out 
+1
source

Source: https://habr.com/ru/post/1239168/


All Articles