Enumeration is Ruby's term for iteration, and there are significant differences in how it is handled within Ruby's existing conventions when compared to other programming languages. Let us first establish some terms:
Enumerable
, a built-in mixin. Other languages usually call this "iterable".Enumerator
, a built-in class. Other languages usually call this "iterator".
The most basic way to iterate over something in Ruby is to call the #each
method on an object, with a block:
[1, 2, 3].each { |x| puts x }
The #each
method forms the backbone of enumeration in Ruby. For example, it is
very common to make something enumerable by including the mixin Enumerable
in
a class, and the only pre-requisite for using this mixin is implementing the
#each
method. This allows the class to get a lot of methods "for free", e.g.
#any?
and #all?
can all be implemented in terms of #each
.
Let's say we want to make an object that produces 1
, 2
, and 3
. The easy
way is to write (1..3)
, but let's do this from scratch:
class Podium
include Enumerable
def each
yield 1
yield 2
yield 3
end
end
Here, we are on the other side of the fence: rather than using #each
, we are
now implementing #each
. The yield
keyword is of interest: it calls the
block that is passed to the method. In other words, the following code is
equivalent to the code above:
class Podium
include Enumerable
def each(&blk)
blk.call(1)
blk.call(2)
blk.call(3)
end
end
The &
sigil in &blk
binds the block — provided by the caller — to the name
blk
. If the caller failed to provide a block, then blk
becomes nil
. The
following two are equivalent:
yield v if block_given?
blk.call(v) unless blk.nil?
Remember that a block by itself is not an object. Consult "What is the difference between a block, a proc, and a lambda in Ruby?" for more information on this topic.
Warning: if you are coming from Python, the
yield
keyword in Ruby does not turn a function into a generator. The strict Ruby equivalent of a Python generator is called aFiber
, and fibers are usually not used for iteration, unlike their Python counterparts. Hang tight for a straightforward Ruby analog to Pythonic iterative generators! We will explore them near the end.
The #each
method effectively divorces how to iterate over a container from
what to do with each iterated element. The method handles the how, and the
given block handles the what. In other words, the method is responsible for
materializing every element inside the container.
What if we only want the very first element from #each
, and we do not have
access to other methods, such as #[]
? That is where an enumerator comes in.
Starting from Ruby 1.9, the convention is that if we call #each
without a
block, then it returns an
enumerator object. We can
then call the #next
method on that enumerator to produce just one element. We
can see this behavior in a built-in array:
irb(main):001:0> a = [3, 5, 7]
=> [3, 5, 7]
irb(main):002:0> e = a.each
=> #<Enumerator: [3, 5, 7]:each>
irb(main):003:0> e.next
=> 3
Our Podium
class doesn't have this behavior yet, but Ruby provides the
Object#to_enum
method on all objects to make it easy to support this behavior:
class Podium
include Enumerable
def each
return to_enum unless block_given?
yield 1
yield 2
yield 3
end
end
It works exactly the same way:
irb(main):011:0> p = Podium.new
=> #<Podium:0x00007fe189942358>
irb(main):012:0> e = p.each
=> #<Enumerator: #<Podium:0x00007fe189942358>:each>
irb(main):013:0> e.next
=> 1
A deeper question is: why would we want to materialize one element at a time? The value lies in space efficiency: by supporting one-at-a-time materialization, we can operate in a low-memory environment without fear of running out of memory when given an enormous data source.
In Ruby, the #to_proc
method is the convention for converting something into a
callable. For example, the built-in type Symbol
implements #to_proc
, so that
we can abbreviate { |x| x.odd? }
into :odd?.to_proc
. This process is so
common that the unary operator &
does the exact same thing, allowing us to
further abbreviate :odd?.to_proc
as &:odd
.
The choice of the punctuation &
visually connects to writing &blk
in an
arguments list, where the meaning of &
similarly means "capture a block as a
proc".
For a deeper dive, consult "How Does Symbol#to_proc Work?".
A common idiom in Ruby is to chain multiple enumerable methods in order to produce a list of results. For example, if we want to take the numbers from 0 to 99 (inclusive), add one to each of them, pick only the odd ones, and sum them up, we could write:
(0..99)
.map(&:succ)
.select(&:odd?)
.reduce(0, &:+)
The (0..99)
is syntax sugar for constructing a
Range
object. The desugared form
is Range.new(0, 99)
. This chain of methods looks great until we look at what
each step of the process does:
irb(main):001:0> (0..99).map(&:succ)
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
It turns out that each call produces an entirely new array! If our data source had a million elements, we would be creating an additional, intermediate million-long array for each additional method we add to the chain.
In the case of arrays, we can cheat this cost away by using mutative methods:
(0..99)
.to_a
.map!(&:succ)
.select!(&:odd?)
.reduce(0, &:+)
Here, we first create a new array from (0..99)
using #to_a
, which guarantees
us access to those mutative methods. But this forces us to remember when
mutative methods are available, and which ones are possible. The cleaner
solution should be that every method call avoids immediately doing any work
until we actually want the result!
In Rails, when dealing with data models, ActiveRecord actually does avoid doing any work until you want the result. Every method you call gives back a relation object, which records the SQL query it has built up so far. It's not until you try to use a model object that the query gets executed!
It turns out that there is a method called Enumerator#lazy
that gives us the
ability to eliminate intermediates for free:
(0..99)
.each
.lazy
.map(&:succ)
.select(&:odd?)
.reduce(0, &:+)
The call above does not build any intermediate arrays. This is so common, in
fact, that another convention is that if an object supports returning an
enumerator from #each
, it also supports calling #lazy
as a shorthand for
.each.lazy
:
(0..99)
.lazy
.map(&:succ)
.select(&:odd?)
.reduce(0, &:+)
We can now apply the same behavior to podium:
class Podium
include Enumerable
def each
return to_enum unless block_given?
yield 1
yield 2
yield 3
end
def lazy
each.lazy
end
end
We can verify that it's not until we actually try to use each value that those values are calculated:
irb(main):015:0> p = Podium.new
=> #<Podium:0x00007fb64cf13ce0>
irb(main):016:0> p.lazy.map(&:succ).reject(&:odd?)
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: #<Podium:0x00007fb64cf13ce0>:each>>:map>:reject>
irb(main):017:0> p.lazy.map(&:succ).reject(&:odd?).each { |i| puts i }
2
4
=> nil
The Enumerator::Lazy
class, used to implement Enumerator#lazy
, also supports
a #force
method that lets use force an eager evaluation all at once:
irb(main):018:0> p.lazy.map(&:succ).reject(&:odd?).force
=> [2, 4]
Its behavior is the same as calling #to_a
.
So far we've built an enumerator by having an object wrap itself in an
enumerator. It turns out there is a second way to do this: by passing a block to
Enumerator.new
:
def podium
Enumerator.new do |yielder|
yielder.yield 1
yielder.yield 2
yielder.yield 3
end
end
It can be used in exactly the same way:
irb(main):008:0> p = podium
=> #<Enumerator: #<Enumerator::Generator:0x00007fdaa15e9d08>:each>
irb(main):009:0> p.lazy.map(&:succ).reject(&:odd?)
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: #<Enumerator::Generator:0x00007fdaa15e9d08>:each>>:map>:reject>
irb(main):010:0> p.lazy.map(&:succ).reject(&:odd?).each { |i| puts i }
2
4
=> nil
Making an enumerator from scratch is very handy if we find ourselves not needing a class.
If you're coming from Python, the block form of
Enumerator.new
is the closest Ruby analog to Python generators. In fact, the low-level way to use an enumerator works almost exactly the same way:irb(main):008:0> p = podium => #<Enumerator: #<Enumerator::Generator:0x00007fa2021f2b78>:each> irb(main):009:0> p.next => 1 irb(main):010:0> p.next => 2 irb(main):011:0> p.next => 3 irb(main):012:0> p.next Traceback (most recent call last): 2: from (irb):12 1: from (irb):12:in `next' StopIteration (iteration reached an end)
Enumerators in Ruby are also powered by fibers, which brings us to the question: when should one be used over the other? If the intent is to make something iterable, then enumerators are a better choice, because they include the
Enumerable
mixin by default. Otherwise, if the purpose is to use coroutines as a way to perform flow control, then fibers provide a much more direct API.
to_proc
could use some background material. (that could possibly be as little as once again referring to the block/proc/lambda article)"in" is spelled with two letters
Make this section a block quote?
does
.to_a
return a new array, thus allowing the mutative methods to act on an intermediate instead of the original(0..99)
?