Actual QA: Process pipelines in Python

It took me long enough to figure this out, I figure somebody else might benefit from my effort.

When you're working in Python and you wish to launch an external program and capture the output, there's an easy solution: you use popen from the subprocess module. All well and good. But suppose you need to fire up a pipeline of external programs, with the output of the first program being piped to the input of the second, and so on. Something like this:

$ cat /etc/passwd | grep -E '^root:' | tr "a-z" "A-Z"

ROOT:*:0:0:SYSTEM ADMINISTRATOR:/VAR/ROOT:/BIN/SH

The answer is still popen, but things get a little complicated if you want to solve the general problem of hooking up n processes into a pipeline. Here's my solution.

def launchProcessPipeline(cmdList):

import fcntl, time

totalCmds = len(cmdList)

if totalCmds > 0:

procs = []

try:

for i in range(totalCmds):

currPipe = None

if i > 0:

currPipe = procs[i-1].stdout

procs.append(subprocess.Popen(cmdList[i], stdout=subprocess.PIPE, stdin=currPipe))

# set stdout file descriptor to nonblocking

flags = \

fcntl.fcntl(procs[-1].stdout.fileno(), fcntl.F_GETFL)

fcntl.fcntl(procs[-1].stdout.fileno(), fcntl.F_SETFL, (flags | os.O_NDELAY | os.O_NONBLOCK))

except:

raise

if len(procs) == totalCmds:

return procs[-1]

return None

def pollProcess(proc):

import select

output = None

# wait 1 millisecond and check whether proc has written anything to stdout

readReady, _, _ = select.select([proc.stdout.fileno()], [], [], 0.001)

if len(readReady):

try:

for line in iter(proc.stdout.readline, ""):

if output is None:

output = ''

output += line

except IOError:

# Ignore any I/O errors reading from the pipe, which are infrequent but not rare.

pass

return output

And here's how you call it:

cmdlist = [

['cat', '/etc/passwd'],

['grep', '-E', '^root:'],

['tr', 'a-z', 'A-Z']

]

proc = launchProcessPipeline(cmdlist)
print pollProcess(proc)

>>> ROOT:*:0:0:SYSTEM ADMINISTRATOR:/VAR/ROOT:/BIN/SH

Actual QA

Saturday, May 14, 2011

Process pipelines in Python

No comments:

Post a Comment

About Me

Blogs I Read

Tags