1.
Introduction
What are generator_tools about?
2. copy_generator
Explains API and implementation of copying
generators.
3. The picklegenerators module
A natural enhancement of making generators copyable is to make them
also persistent. The picklegenerators module serves as a replacement
for the pickle module.
4.
Caveats
Troublesome issues you need to care about when using generator_tools.
5. Acknowledgements
To those who deserve them.
6.
History
Change history.
1. Introduction
Generators are a fundamental abstraction of the Python language.
Generators were first introduced in Python 2.2 together with the yield
keyword turning a function into an generator. In subsequent releases
Python became ever more generator friendly with generator expressions
and finally with turning yield statements into yield expressions and
generators into proper coroutines. A few limitations however remained.
Most notably generators are not copyable and can't be pickled which is
an
obstacle for using generators to implement lightweight concurrency a la
Erlang and distribute them transparently across different OS level
processes. These problems have been solved by the Stackless Python dialect of
Python which addressed coroutines and cooperative multitasking
specifically.While SP preceeded CPythons development of coroutines for
years, the arival of Python 2.5 means a significant feature overlap
between SP tasklets and generators.
As long as no native implementation for copying generators is available
the generator_tools package can be used to close the gap. The package
is implemented in pure Python and it is available for Python 2.5 and
also Python 3.0.
The basic facilities are the copy_generator
and pickle_generator
functions. Those were described in sections 2 and 3.
2. copy_generator
The implementation of copy_generator is based on a deep bytecode hack
or expressed in less pejorative manner: it uses runtime introspection
and Python bytecode assembler synthesis. For those who are impatient
and not interested in detail design the API is treated first. Later in
section 2.2 the implementation strategy is iluminated.
2.1 copy_generator usage
-
copy_generator( |
f_gen,
[copy_filter]) |
- Keeps a generator object and returns a generatorcopy
object. A generatorcopy
object provides the same public interface and behaviour as a generator
object but stores additional data used for re-copying. Passing a generatorcopy
object is permitted as well.
New in version
0.2: You can pass an
optional copy_filter parameter which is a boolean function of one
argument. The argument being passed on call of copy_filter is a local
variable of the copied generator. The default behaviour is such that
all locals ( and therefore the generator ) is deepcopied. An exception
is made with objects that have an attribute "sharable" with value
sharable = True. Locals which are assigned as sharable are passed into
the generatorcopy uncopied i.e. without creating new object identities
( see also 2.1.2 for examples ).
Example 1:
def
f(start):
# generator
i = start
while i<start+10:
yield i
i+=1
>>> f_gen = f(3) # creating a generator object f_gen
>>> f_gen.next() # do some action
3
>>> g_gen = copy_generator(f_gen)
>>> g_gen.next()
4
>>> f_gen.next()
4
>>> list(f_gen) == list(g_gen)
True
|
2.1.1 copy_generator and for-loops
Copying generators containing while-loops, even nested while loops
doesn't cause known problems. Also interactions with try...except
statements is not troublesome. However for-loop defintions don't work
out of the
box. We elaborate on this in the nexrt subsection. Here I will present
the
workaround only that makes for-loops copyable. Note, this workaround is
enforced by the current implementation and a CopyGeneratorException
is raised when not being detected.
Syntactically a for-loop has three user defined parts:
for_stmt: 'for' exprlist 'in' testlist ':' suite
We are only interested in the testlist
part which is highlighted in the following example:
for k in range(20): # in this
form copy_generator can't transfer state
yield k
|
This testlist part has to be factored out. But this alone is not
sufficient. It has to be wrapped using the for_iter
function:
it = for_iter(range(20))
# required boilerplate
for k in it:
yield k
|
-
- Keeps an iterable and returns an object of type
IteratorView. IteratorViews can be saved and copied. Different
IteratorViews can share a unique iterator and maintain
their own state of iteration.
IteratorViews being exposed as local variables enable the
copy_generator function to deal with nested
for-loops and generator sequences:
Example 2 ( continues Example 1):
def g(x,y):
r = for_iter(range(x,y))
for i in r:
yield i
def f():
G = for_iter([g(0,7), g(7,14), g(14, 21)])
for h in G:
H = for_iter(h)
for item in H:
yield item
>>> gen_f = f()
>>> gen_f.next()
0
>>> gen_f.next()
1
>>> gen_g.next()
2
>>> gen_f.next()
2
>>> list(gen_f) == list(gen_g)
True
|
For additional examples you might take a look at unit tests
implemented in test/test_copygenerators.py
2.1.2 Modify the copying behaviour ( new
in version 0.2 )
Basically the copy_generator function produces deepcopies of generator
objects. However this can be regulated. The second optional parameter
is a function with signature object -> bool.
It is called copy_filter. The copy_generator function has a default
copy_filter defined as:
lambda loc :
hasattr(loc, "sharable") and loc.sharable
The idea is that only those objects that pass the filter get shared or
passed unmodifed into the generator copy. All others are deepcopied and
a shared nothing policy is applied.
One other relevant copy_filter is simple:
lambda loc : True
Here everying is shared. This copy_filter is used on
reconstring a generator object from pickling. The usecase is motivated
as follows. One cannot pickle a generator object gen_f directly but has
to map it onto a GeneratorSnapshot object GS_f first that can be
pickled. On unpickling GS_f will be reconstructed and to get gen_f back
one has to apply copy_generator(GS_f) once. Here we share each local of
GS_f with the new gen_f and throw GS_f away in the next step. Note this
is fully automized and you usually don't have to care unless you want
to implement your own pickling routines.
2.2 copy_generator implementation
The implementation of copy_generator is based on two ideas:
- Modify the generator assembly and generate JUMP_ABSOLUTE
commands on certain jump addresses. That's easy on a first try but hard
to get it right s.t. no stack is corrupted.
- Read all local variables of an existing generator object
f_gen using runtime introspection and pass them as parameters into a
newly created generator function g. These parameters constitute the
local
environment for executing the generator object g_gen created from g.
2.2.1 Making a JUMP ABSOLUTE
Let's consider the following simple Python generator containing two
nested while-loops and the related disassembly:
def f(x, y):
i = 0
while i<x:
j = 0
yield i
while j<y:
yield (i,j)
j+=1
>>> import dis
>>> dis.dis(f)
2 0
LOAD_CONST
1 (0)
3
STORE_FAST
2 (i)
3 6
SETUP_LOOP
71 (to 80)
>> 9
LOAD_FAST
2 (i)
12
LOAD_FAST
0 (x)
15
COMPARE_OP
0 (<)
18
JUMP_IF_FALSE
57 (to 78)
21
POP_TOP
4 22
LOAD_CONST
1 (0)
25
STORE_FAST
3 (j)
5 28
LOAD_FAST
2 (i)
31 YIELD_VALUE
32
POP_TOP
6 33
SETUP_LOOP
39 (to 75)
>> 36
LOAD_FAST
3 (j)
39
LOAD_FAST
1 (y)
42
COMPARE_OP
0 (<)
45
JUMP_IF_FALSE
25 (to 73)
48
POP_TOP
7 49
LOAD_FAST
2 (i)
52
LOAD_FAST
3 (j)
55
BUILD_TUPLE
2
58 YIELD_VALUE
59
POP_TOP
8 60
LOAD_FAST
3 (j)
63
LOAD_CONST
2 (1)
66 INPLACE_ADD
67
STORE_FAST
3 (j)
70
JUMP_ABSOLUTE
36
>> 73
POP_TOP
74
POP_BLOCK
>> 75
JUMP_ABSOLUTE
9
>> 78
POP_TOP
79
POP_BLOCK
>> 80
LOAD_CONST
0 (None)
83 RETURN_VALUE
|
In case of stopping the generator and yielding the first value ( here
it is i ) the last executed
instruction has the index 31 which is considered as a label or offset
when counting from the start of the bytecode sequence. We can read this
value from an generator object in a state of execution with non-empty
stack frame:
>>> f_gen = f(3,4)
>>> f_gen.gi_frame.f_lasti
-1
>>> f_gen.next()
0
>>> f_gen.gi_frame.f_lasti
31
|
Actually we can keep more information from f_gen, namely the current
local context of evaluation:
>>> f_gen.gi_frame.f_locals
{'i': 0, 'y': 4, 'j': 0, 'x': 3}
|
Now suppose we have a function g which keeps all local variables at
jump address 31 of f as parameters:
def
g(x, y, i, j):
i = 0
while i<x:
j = 0
yield i
while j<y:
yield (i,j)
j+=1
|
But instead of starting evaluation with a LOAD_CONST bytecode we aim to
jump directly of the offset 31 i.e. exactly to the place where
evaluation of f_gen has
stopped. The bytecode interpreter actually needs to jump a little
further because evaluating
POP_TOP is meaningfull only when a preceeding stack operation was
performed. So we increment the offset by 2.
We express our decision by prepending a
JUMP_ABSOLUTE bytecode with 33 as its argument.
2 0
JUMP_ABSOLUTE
33
3
LOAD_CONST
1 (0)
6
STORE_FAST
2 (i)
3 9
SETUP_LOOP
74 (to 83)
>> 12
LOAD_FAST
2 (i)
15
LOAD_FAST
0 (x)
....
|
With this modified bytecode we can create the function g which resumes
execution where f_gen was stopped.
2.2.2 A crash with many causes
The first naive solution has been suggested as a recipe
in the ActiveState Python Cookbook and it failed as Klaus M�ller from
the SimPy projected
pointed out. The cause of error is not easy to see. There are three
factors:
- The first implementation didn't correct the jump addresses
of all other JUMP_ABSOLUTE calls. This caused the error message Klaus
reported. The algorithm jumped to invalid address.
- With SETUP_LOOP and other SETUP_XXX commands a block is
pushed on a block stack. The complementary action is POP_BLOCK. While
an
underflow ( too many POP_BLOCK calls ) leads to immediate crash, an
upper threshold for SETUP_LOOP calls ( currently 20 ) that guides
overflow exceptions. You can check this out by creating 20 nested
while-loops for example.
- Klaus example contained a for-loop. For-loops are more
complex to handle than while-loops. On bytecode level they are
distincted by the
additional bytecodes GET_ITER and FOR_ITER. GET_ITER creates an
internal iterable not being exposed a user level code and pushes it on
the stack. Without exposition it can't be copied. But what's worse
GET_ITER must be called to enable subsequent calls of FOR_ITER and
GET_ITER is not a jump target for one next iteration but FOR_ITER only.
2.2.3 Chains of jumps and blocks
As a solution for problem 2) and 3) chains of JUMP_ABSOLUTE bytecodes
are used to connect different SETUP_XXX bytecodes preceeding the
cricical offset. So we are hopping from one SETUP to the next and
initialize the block stack. In case of a SETUP_LOOP belonging to
a for-loop we are processing all bytecodes until the first one after
FOR_ITER .
However generating jumps alone is not sufficient because each jump
shall be processed exactly one. We need to deactivate all jumps when
reaching the target offset. This cannot be done dynamically or by
another bytecode transformation. Instead we use a local variable called
jump_forward_on
as a switch and prepare a bytecode block carefully s.t.
the JUMP_ABSOLUTE command is called when jump_forward_on =
True and skipped otherwise.
The final block is a modified version of the block shown in the
diagram. When jump_forward_on is
True a second variable called jump_forward_off is
loaded and its value is assigned to jump_forward_on.
If this value is False the JUMP_IF_FALSE command causes skipping
JUMP_ABSOLUTE in the next iteration:
0
LOAD_FAST n
(jump_forward_on)
3
JUMP_IF_FALSE 10
6
LOAD_FAST n+1 (jump_forward_off)
9
STORE_FAST n
12 POP_TOP
13 JUMP_ABSOLUTE
setup_offset
16 POP_TOP
|
Finally we summarize and highlight changes on transformations of our
original bytecode.
>>> f_gen = f(3,4)
>>> f_gen.next()
0
>>> f_gen.next()
1
>>> g_gen = copy_generator(f_gen)
>>> import dis
>>> dis.dis(g_gen._g)
352 0
JUMP_ABSOLUTE
9
3
LOAD_CONST
1 (0)
353 6
STORE_FAST
2 (i)
>>
9
SETUP_LOOP
99 (to 111)
12
LOAD_FAST
4 (jump_forward_on)
15
JUMP_IF_FALSE
4 (to 22)
18
POP_TOP
19
JUMP_ABSOLUTE
47
354
>>
22
POP_TOP
>> 23
LOAD_FAST
2 (i)
26
LOAD_FAST
0 (x)
29
COMPARE_OP
0 (<)
32
JUMP_IF_FALSE
74 (to 109)
35
POP_TOP
36
LOAD_CONST
1 (0)
39
STORE_FAST
3 (j)
42
LOAD_FAST
2 (i)
45 YIELD_VALUE
46
POP_TOP
>> 47
SETUP_LOOP
56 (to 106)
50
LOAD_FAST
4 (jump_forward_on)
53
JUMP_IF_FALSE
10 (to 66)
56
LOAD_FAST
5 (jump_forward_off)
59
STORE_FAST
4 (jump_forward_on)
62
POP_TOP
63
JUMP_ABSOLUTE
91
>> 66
POP_TOP
>> 67
LOAD_FAST
3 (j)
70
LOAD_FAST
1 (y)
73
COMPARE_OP
0 (<)
76
JUMP_IF_FALSE
25 (to 104)
79
POP_TOP
80
LOAD_FAST
2 (i)
83
LOAD_FAST
3 (j)
86
BUILD_TUPLE
2
89 YIELD_VALUE
90
POP_TOP
>> 91
LOAD_FAST
3 (j)
94
LOAD_CONST
2 (1)
97 INPLACE_ADD
98
STORE_FAST
3 (j)
101
JUMP_ABSOLUTE
67
>> 104
POP_TOP
105
POP_BLOCK
>> 106
JUMP_ABSOLUTE
23
>> 109
POP_TOP
110
POP_BLOCK
>> 111
LOAD_CONST
0 (None)
114
RETURN_VALUE
|
2.3 How to clone a running for-loop ?
How can one clone an iterator? Hasn't this question been answered
already with regard to cloning generators? Or else: if we invent a
different cloning algorithm for iterators why not applying it also
towards generators, something more simple than bytecode hackery? The
basic answer for cloning iterators is very simple: turn them into lists
and clone those. But this is inefficient and a serialization strategy
at best - something being important when we pickle iterators.
The answer to copying iterators is to provide IteratorViews. This means
we do not copy iterators at all but provide copies of IteratorView
objects that refer to one unique _IterWrapper object which encapsulate
an iterator and caches its values. The real iterator is therefore
shared among different IteratorViews. An IteratorView just stored the state of its own iteration. This
state is an index into the cached iterator value list. When an
IteratorView is copied it is the index that is copied as well and it
will then be incremented separately in subsequent calls to next().
However the truth about IteratorViews is actually a little bit more
complicated as will be shown below:
>>>
r = range(4)
>>> iv = for_iter(r)
>>> iv
<__main__.IteratorView object at 0x0134B330>
>>> iv.next() # iv.offset -> 0 , recorded ->
[0]
0
>>> iv.next() # iv.offset -> 1 , recorded ->
[0, 1]
1
>>> iv2 = iv.__gencopy__() # copy iv with offset = 1
>>> iv2.next()
1
# ??? Bug?
# Didn't we all expect the offset being 1 and the next value being 2 ?
|
The reason for this admittedly strange behaviour is the fact that
for_iter is not used primarily for just copying iterators but copying
iterators of for-loops. To understand why we decrement the offset at
copying we need to decompile a iterator containg a for-loop.
352
0
JUMP_ABSOLUTE
21
3
LOAD_GLOBAL
0 (for_iter)
6
LOAD_GLOBAL
1 (range)
9
LOAD_CONST
1 (10)
12
CALL_FUNCTION
1
15
CALL_FUNCTION
1
353 18
STORE_FAST
0 (it)
>> 21
SETUP_LOOP
36 (to 60)
24
LOAD_FAST
0 (it)
27
GET_ITER
>> 28
FOR_ITER
28 (to 59)
354
31
STORE_FAST
1 (i)
34
LOAD_FAST
2 (jump_forward_on)
37
JUMP_IF_FALSE
10 (to 50)
40
LOAD_FAST
3 (jump_forward_off)
43
STORE_FAST
2 (jump_forward_on)
46
POP_TOP
47
JUMP_ABSOLUTE
56
>> 50
POP_TOP
51
LOAD_FAST
1 (i)
54 YIELD_VALUE
55
POP_TOP
>> 56
JUMP_ABSOLUTE
28
>> 59
POP_BLOCK
>> 60
LOAD_CONST
0 (None)
63 RETURN_VALUE
|
The critical section is highlighted. What happened here is that the
iterator is called once with FOR_ITER before the JUMP_ABSOLUTE command
at offset 56 is reached which continues iteration with another jump to
FOR_ITER. So FOR_ITER is called twice! By decrementing the index in the
IteratorView we ensure that the correct value is used at each
iteration.
Maybe you have noticed that the name of the copy function is called
__gencopy__ and it might be applied for internal copying purposes only.
Another function is called __itercopy__ and it provides non-surprising
behaviour as being shown in the next example:
>>>
iv = for_iter(range(4))
>>> iv.next()
0
>>> iv.next()
1
>>> iv2 = iv.__itercopy__()
>>> iv2.next()
2
>>> iv2.next()
3
>>> iv.next()
2
>>> list(iv)
[3]
>>> iv2.next()
Traceback (most recent
call last):
File
"<interactive input>", line 1, in <module>
File
"c:\lang\python25\lib\site-packages\generator_tools\copygenerators.py",
line 64, in next
item = self.itwrap.next(self)
File
"c:\lang\python25\lib\site-packages\generator_tools\copygenerators.py",
line 28, in next
item = self.it.next()
StopIteration
|
You can use also use copy.deepcopy to achieve the same effect.
3. The picklegenerators
module
Changed
in
version
0.3
The picklegenerators modules serves as a full replacement for the
pickle
module. You can use convenient API functions like dump, dumps or load
or the classes Pickler, Unpickler. The only notable difference is that
generator objects ( as well as some derived objects ) can now be
pickled / unpickled. Classes like GeneratorPickler or functions like
unpickle_generator are marked as deprecated and a warning is sent when
being called. They will be entirely removed in version 0.5 of
generator_tools.
4. Caveats
Some additional advices
for the user of generator_tools
4.1 avoid yield expressions in finally clauses
On a few occasions yield expressions in finally clauses crashed my
Python interpreter. This behaviour even lead to drop of those tests
from the test suite and leave the note here instead. While not always
being harmfull when copied there are at least indications for not
working correctly and they shall be avoided.
4.2 Python 3.0 is not dry yet
I've done some wrapping of datatypes after unpickling. I believe
pickling/unpickling is broken in Python 3.0 at this stage of
development but I expected struggling with type conversions anyway.
New in
version
0.2: pickling is
troublesome using Python 3.0 a2. Not sure if it's a bug in
generator_tools or in Python 3.0. I will suspend additional releases
supporting 3.0 and wait until the first beta.
5. Acknowledgements
Thanks to Klaus Müller for important reinforcements and Lorenz Quack
for providing a redesign of the picklegenerators module.
6. History
Date
|
Version
|
Change
|
2008-26-1
|
0.3
|
Redesign of the
picklegenerators module ( by Lorenz Quack )
|
2008-22-1
|
0.2
|
- Implement copy_filter function as second optional
parameter
- Modify unpickling s.t. GeneratorSnapshot is not
deepcopied anymore
- Adapt interface of GeneratorPickler to be complient
with interface of pickler.Pickler and declare the old interface as
deprecated.
- Delay a 0.2 release for Python 3.0
|
2007-10-5
|
0.1
|
Initial release
|
|