In the following we will be using R’s “restarts” feature to implement the state machine that drives generators in languages such as Python. Generators allow lazily generating values on demand: a consumer invokes a generator, and consumes values as they are produced. A new value is only produced once the previous one has been consumed.
Decoupling consumer and producer in this fashion is a powerful technique, because a consumer often has a better idea of how many values are needed than the generator. This means that a consumer can interrupt the generation of more values.
One consequence of this is that generators can be infinite; for instance, here is a potential implementation of a Fibonacci generator in Python:
def fib():
a, b = 0, 1
while True:
yield b
a, b = b, a + b
Now if we want to print the first ten Fibonacci numbers we can use the generator as follows:
i = 0
for n in fib():
i += 1
if i > 10: break
print n
Neat! So what’s stopping us from doing this in R? Well, a first answer is:
the lack of yield
, which yields execution back to the caller but then
resumes the function execution until the function either reaches its end,
or the caller aborts it.
A second answer is: nothing!
In fact, we can implement yield
in less than ten lines of code if we’re
willing to cheat (heck, are we!).
yield = function (item) {
do_yield = function (item, expr, index_name, parent) {
item = setNames(list(item), index_name)
expr = do.call(substitute, list(expr, item))
eval(expr, parent)
}
yield_condition = structure(list(value = item), class = c('yield', 'condition'))
withRestarts(stop(yield_condition), do_yield = do_yield)
}
?!?
This function essentially needs to be read bottom-to-top: it stop
s
execution with a specific condition (now’s a good time to read up on
condition handling). However, this happens within a withRestarts
.
Don’t bother reading its documentation; it’s
incomprehensible. In a nutshell, this function allows us to specify
“restarts” — that is, get-out-of-jail-free cards that are used higher up by
the user to resume execution after a condition was signalled. Such a
restart is defined here as do_yield
(for lack of a better name …).
What does this restart-function do, and where is it called from? Glad you
asked …. But first we need to define a bit of syntactic sugar. The end goal
is to use yield
just as we would use it in Python; for example, in a
for
loop. We’ll keep it simpler for now and define our own loop-like
construct:
foreach (i, generator(), message('Hello ', i))
It’s this foreach
which will have to ensure that execution is restarted
after the yield
condition was signalled. Without further ado:
foreach = function (index, generator, expr) {
# Don’t evaluate anything (just yet):
index_name = as.character(substitute(index))
generator = substitute(generator)
expr = substitute(expr)
# And remember where we’re calling from:
parent = parent.frame()
# Here the magic happens.
yielder = function (e)
invokeRestart('do_yield', e$value, expr, index_name, parent)
withCallingHandlers(eval(generator, parent), yield = yielder)
}
And let’s test it real quick:
generator = function () {
for (i in 1 : 5)
yield(i)
}
foreach (i, generator(), cat('Hello', i, '\n'))
## Hello 1
## Hello 2
## Hello 3
## Hello 4
## Hello 5
It works!
So: what’s going on here? foreach
essentially evaluates the generator
expression (which, in our example, is a call to the generator
function). But rather than evaluating it directly, it does so “with
calling handlers”:
“Here’s an expression that we’d like to execute,” we tell R. “And just in
case this raises a yield
condition,” we continue, “please execute our
yielder
function.” A lot of jumping around ensues: the generator
function gets called, which in turn calls yield
. yield
signals a
yield
condition by calling stop
. And foreach
reacts to this yield
condition by dispatching to the yielder
. The yielder
now invokes the
restart that we had previously defined just before calling stop
(remember?). That restart, called, do_yield
, gets passed a bunch of
parameters:
- the current value, which we have stored inside the condition;
- the expression (the “loop body”) to execute for each value;
- the name of the index variable in our loop; and finally,
- the caller’s environment, in which to execute the loop body.
We have now come full circle: do_yield
takes the loop body expression and
substitutes the name of the index variable with its current value. In our
example, it takes the unevaluated expression cat('Hello', i, '\n')
and
transforms it into cat('Hello', 1, '\n')
(obviously that’s for the first
iteration; subsequently the values 2, … are substituted). And finally, it
evaluates the so formed expression in the caller’s environment. The end.
There remains one thing to do: make this work with R’s own for
loop.
for
, like almost everything else in R, is simply a function. And we can
totally redefine it. Oh yes.
`for` = function (var, seq, expr) {
tryCatch(eval.parent(substitute(foreach(var, seq, expr),
list(var = substitute(var),
seq = substitute(seq),
expr = substitute(expr)))),
`break` = function (e) invisible())
invisible()
}
# Another small hack to make `break` work as expected.
makeActiveBinding('break',
function () stop(structure(list(), class = c('break', 'condition'))),
environment())
for (i in generator())
cat('Hello for', i, '\n')
## Hello for 1
## Hello for 2
## Hello for 3
## Hello for 4
## Hello for 5
(Granted, I cheated again: redefining for
as above makes it unusable in
the conventional way, so I had to redefine the generator
function to use
lapply
instead of for
. It’s still yield
ing though.)
And finally, here’s the fib
example we started with — an infinite
generator of Fibonacci numbers:
fib = function () {
x = c(0, 1)
repeat {
yield(x[2])
x = c(x[2], sum(x))
}
}
i = 0
for (n in fib()) {
if ((i = i + 1) > 10) break
cat(n, '\n')
}
## 1
## 1
## 2
## 3
## 5
## 8
## 13
## 21
## 34
## 55
I mentioned initially that our implementation is cheating. In fact, a Python generator can be invoked and stored in a variable for later consumption, like so:
x = fib()
Alas, attempting this with our iterator won’t work:
x = fib()
## Error in stop(yield_condition): bad error message
As we haven’t installed a handler for yield
here, the code breaks. Fixing
this is beyond the scope of this article (it requires a different
approach).
To understand more about how restarts in R work, head over to the Advanced R chapter Beyond exception handling: conditions and restarts.
And if you find the code above entertaining, here are some more projects:
- Look at the source code of
base::callCC
and work through that. It’s short — less than ouryield
— put packs a punch. - List comprehension syntax like in Haskell
- A concise lambda syntax:
x -> 2 * x
defines a function equivalent tofunction (x) 2 * x
. - Function decorators like in Python (still work in progress).
This is an awesome gist.