Saturday, May 14, 2011

Process pipelines in Python

It took me long enough to figure this out, I figure somebody else might benefit from my effort.

When you're working in Python and you wish to launch an external program and capture the output, there's an easy solution: you use popen from the subprocess module. All well and good. But suppose you need to fire up a pipeline of external programs, with the output of the first program being piped to the input of the second, and so on. Something like this:

$ cat /etc/passwd | grep -E '^root:' | tr "a-z" "A-Z"
ROOT:*:0:0:SYSTEM ADMINISTRATOR:/VAR/ROOT:/BIN/SH

The answer is still popen, but things get a little complicated if you want to solve the general problem of hooking up n processes into a pipeline. Here's my solution.

def launchProcessPipeline(cmdList):
import fcntl, time

totalCmds = len(cmdList)
if totalCmds > 0:

procs = []

try:
for i in range(totalCmds):
currPipe = None

if i > 0:
currPipe = procs[i-1].stdout

procs.append(subprocess.Popen(cmdList[i], stdout=subprocess.PIPE, stdin=currPipe))

# set stdout file descriptor to nonblocking
flags = \
fcntl.fcntl(procs[-1].stdout.fileno(), fcntl.F_GETFL)
fcntl.fcntl(procs[-1].stdout.fileno(), fcntl.F_SETFL, (flags | os.O_NDELAY | os.O_NONBLOCK))

except:
raise

if len(procs) == totalCmds:
return procs[-1]

return None


def pollProcess(proc):
import select
output = None

# wait 1 millisecond and check whether proc has written anything to stdout
readReady, _, _ = select.select([proc.stdout.fileno()], [], [], 0.001)

if len(readReady):

try:
for line in iter(proc.stdout.readline, ""):

if output is None:
output = ''

output += line

except IOError:
# Ignore any I/O errors reading from the pipe, which are infrequent but not rare.
pass

return output

And here's how you call it:

cmdlist = [
['cat', '/etc/passwd'],
['grep', '-E', '^root:'],
['tr', 'a-z', 'A-Z']
]
proc = launchProcessPipeline(cmdlist)
print pollProcess(proc)


>>> ROOT:*:0:0:SYSTEM ADMINISTRATOR:/VAR/ROOT:/BIN/SH

No comments:

Post a Comment