Prev         Up         Next

Introduction

Threads, processes and the GIL

To run more than one piece of code at the same time on the same computer one has the choice of either using multiple processes or multiple threads.

Although a program can be made up of multiple processes, these processes are in effect completely independent of one another: different processes are not able to cooperate with one another unless one sets up some means of communication between them (such as by using sockets). If a lot of data must be transferred between processes then this can be inefficient.

On the other hand, multiple threads within a single process are intimately connected: they share their data but often can interfere badly with one another. It is often argued that the only way to make multithreaded programming "easy" is to avoid relying on any shared state and for the threads to only communicate by passing messages to each other.

CPython has a Global Interpreter Lock (GIL) which in many ways makes threading easier than it is in most languages by making sure that only one thread can manipulate the interpreter's objects at a time. As a result, it is often safe to let multiple threads access data without using any additional locking as one would need to in a language such as C.

One downside of the GIL is that on multi-processor (or multi-core) systems a multithreaded Python program can only make use of one processor at a time. This is a problem that can be overcome by using multiple processes instead.

Python gives little direct support for writing programs using multiple process. This package allows one to write multi-process programs using much the same API that one uses for writing threaded programs.

Forking and spawning

There are two ways of creating a new process in Python:

The processing package uses os.fork() if it is available since it makes life a lot simpler. Forking the process is also more efficient in terms of memory usage and the time needed to create the new process.

The Process class

In the processing package processes are spawned by creating a Process object and then calling its start() method. processing.Process follows the API of threading.Thread. A trivial example of a multiprocess program is

from processing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=['bob'])
    p.start()
    p.join()

Here the function f is run in a child process.

For an explanation of why (on Windows) the if __name__ == '__main__' part is necessary see Programming guidelines.

Exchanging objects between processes

processing supports two types of communication channel between processes:

Queues

The function Queue() returns a near clone of Queue.Queue -- see the Python standard documentation. For example

from processing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=[q])
    p.start()
    print q.get()    # prints "[42, None, 'hello']"
    p.join()

Queues are thread and process safe.

Note that if you need to guarantee that the put() method will succeed without blocking -- which is sometimes very useful when ensuring that a deadlock cannot occur -- then you can use BufferedQueue() instead of Queue(). See Queues.

Pipes

The Pipe() function returns a pair of connection objects connected by a duplex (two-way) pipe, for example

from processing import Process, Pipe

def f(conn):
    conn.send([42, None, 'hello'])

if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=[child_conn])
    p.start()
    print parent_conn.recv()   # prints "[42, None, 'hello']"
    p.join()

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read/write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time. See Pipes.

Synchronization between processes

processing contains equivalents of all the synchronization primitives from threading. For instance one can use a lock to ensure that only one process prints to standard output at a time:

from processing import Process, Lock

def f(l, i):
    l.acquire()
    print 'hello world', i
    l.release()

if __name__ == '__main__':
    lock = Lock()

    for num in range(10):
        Process(target=f, args=[lock, num]).start()

Without using the lock output from the different processes is liable to get all mixed up.

Sharing state between processes

As mentioned above, when doing threaded programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes.

However, if you really do need to use some shared data then you can create by using a manager. Managers can store their data in one of two ways:

Shared memory:

Data can be stored in a shared memory map managed by an instance of LocalManager. Only those data types supported by the struct and array modules are supported. For example the following code

from processing import Process, LocalManager

def f(n, s, a):
    n.value = 42
    s.value = (0.75, 'hello')
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == '__main__':
    manager = LocalManager()

    number = manager.SharedValue('i', 0)
    struct = manager.SharedStruct('d256p', (0.0, ''))
    array = manager.SharedArray('i', range(10))

    p = Process(target=f, args=[number, struct, array])
    p.start()
    p.join()

    print number
    print struct
    print array

will print

SharedValue('i', 42)
SharedStruct('d256p', (0.75, 'hello'))
SharedArray('i', [0, -1, -2, -3, -4, -5, -6, -7, -8, -9])

Note that SharedValue, SharedStruct and SharedArray objects are thread and process safe. The 'i' and 'd256p' arguments used when creating num, struct and array are format strings of the type used by the struct and array modules: 'i' indicates a signed integer, 'd' indicates a double precision float and '256p' indicates a string of length less than 256.

Server process:

A manager object returned by processing.Manager() represents a server process which holds python objects and allows other processes to manipulate them using proxies.

In addition to SharedValue, SharedStruct and SharedArray a manager returned by processing.Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue. For example:

from processing import Process, Manager

def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.reverse()

if __name__ == '__main__':
    manager = Manager()

    d = manager.dict()
    l = manager.list(range(10))

    p = Process(target=f, args=[d, l])
    p.start()
    p.join()

    print d
    print l

will print

{0.25: None, 1: '1', '2': 2}
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Creating managers which support other types is not hard --- see Customized managers.

Proxy objects are picklable and can be transferred between processes.

Server process managers are more flexible than shared memory managers because they support more data types (and can be made to support arbitrary types). Also, a single server process manager can be shared by different computers over a network. They are, however, slower than using shared memory.

Using a pool of workers

The Pool() function returns an object representing a pool of worker processes. It has methods which allows tasks to be offloaded to the worker processes in a few different ways.

For example:

from processing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes
    result = pool.apply_async(f, [10])    # evaluate "f(10)" asynchronously
    print result.get(timeout=1)           # prints "100" unless your computer is *very* slow
    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"

See Process pools.

Speed

Passing small objects between processes using processing.Queue is not significantly slower than passing the same objects between threads using Queue.Queue. (In fact on Linux it tends to be faster.) One can also use processing.Pipe which at least on Windows is quite a bit faster than processing.Queue.

The following benchmarks were performed on a single core Pentium 4, 2.5Ghz laptop running Windows XP and Ubuntu Linux 6.10 --- see test_speed.py.

Number of 256 byte strings passed between processes/threads per sec:

Connection type Windows Linux
Queue.Queue 49,000 17,000-50,000 [1]
processing.PosixQueue n/a 62,000 [2]
processing.PipeQueue 27,000 37,000 [2]
Queue managed by server 6,900 6,500
processing.Pipe 52,000 57,000
[1]For some reason the performance of Queue.Queue is very variable on Linux.
[2](1, 2) processing.Queue is an alias for processing.PosixQueue if available or processing.PipeQueue.

Number of acquires/releases of a lock per sec:

Lock type Windows Linux
threading.Lock 850,000 560,000
processing.Lock 430,000 510,000
Lock managed by server 10,000 8,400
threading.RLock 93,000 76,000
processing.RLock 430,000 500,000
RLock managed by server 8,800 7,400

Number of interleaved waits/notifies per sec on a condition variable by two processes:

Condition type Windows Linux
threading.Condition 26,000 28,000
processing.Condition 22,000 18,000
Condition managed by server 6,600 6,000

Number of integers retrieved from a sequence per sec:

Sequence type Windows Linux
list 6,000,000 4,800,000
shared memory array 120,000 110,000
list managed by server 20,000 17,000