Posted on 14th May 2015
One of the fun things about C++11 is that minor changes allow you to write somewhat more expressive (aka Pythonic) code.
A common pattern is to iterate over an array while also knowing the index you are at. Python does this with the enumerate
keyword:
for index, x in enumerate(container):
print("At {} is {}".format(index, x))
In C++ the idiomatic way to do this is perhaps:
std::vector<int> container;
...
for (int i = 0; i < container.size(); ++i) {
std::cout << "At " << i << " is " << container[i] << "\n";
}
This has its flaws though (and is tedious to type). Using g++, if you have warnings on, -Wall
, then you'll see complaint, as the type of container.size()
is, typically, of size std::size_t
which on my machine is unsigned long long
(aka an unsigned 64-bit number). Warnings aside, as int
is on my machine a 32-bit unsigned number, we could potentially have problems if the size of container is more than around 2 billion (unlikely, but possible these days).
We could write:
for (std::size_t i = 0; i < container.size(); ++i) {
or more tediously, but more showy:
for (decltype(container.size()) i = 0; i < container.size(); ++i) {
Surely we can do better? So I sat down and wrote a little template function enumerate(T container)
which returns a class implementing the methods begin()
and end()
, these in turn returning custom iterators which count up from 0
to container.size() - 1
. You can thus write:
foreach(auto index : enumerate(container)) {
std::cout << "At " << index << " is " << container[index] << "\n";
}
What makes C++ interesting, and frustrating, is certain subtle issues. Our iterator returns a number, for which we implement operator*
:
SizeType operator*() { return index; }
That's great, but what if the user tries to capture the index by reference:
foreach (auto& i : eneumerate(container))
We get the cryptic error "error: invalid initialization of non-const reference of type 'long long unsigned int&' from an rvalue of type 'long long unsigned int'" (Explanation from StackOverflow. Basically, when we return index;
we are not returning index
, but rather a temporary copy (aka r-value) and, as you might hope, you're not allowed to take the reference of a temporary copy...) We can "fix" this by:
SizeType& operator*() { return index; }
However, we've now giving the user access to the index, so this
foreach (auto& i : eneumerate(container)) {
std::cout << container[i] << "\n"; ++i; }
prints every second term of container. Is that what we want? If not, then we can use a const
:
const SizeType& operator*() { return index; }
The command ++i;
now gives an error "error: increment of read-only reference 'i'" which is, to my mind, (a) not cryptic; and (b) makes it clear to the user what the issue is.
Let's get ambitious, and use enumerate
for something else:
struct range {
int size()const { return 8; }
};
...
for (auto n : enumerate(range{}))) {
std::cout << n << " ";
}
This seems to work, so try something more ambitious:
class range {
private:
int limit;
public:
range(const int l) : limit{l} {}
int size()const { return limit; }
};
...
for (auto n : enumerate(range(8)))) {
std::cout << n << " ";
}
This again seems to work, but it may not... Suppose we add:
~range() { std::cout << "~range() called." << std::endl }
Then the output is:
range::~range() called.
0 1 2 3 4 5 6 7
Erm... Well, a range
object is created with initial value 8
, this is passed to the enumerate
function which takes a const reference of the range object which is passed to a class which implements begin()
and end()
; the class also holds a reference. Then the range
object ends its life, and the destructor is called. This occurs before the for
loop, and with more checking, actually before the call to end
which uses the size
method. Why? Because range(8)
creates an object, whose lifespan is just during the call to the function enumerate
. That it works at all is a fluke...
To fix this, let's examine some design assumptions:
for (type i : container)
expression calls both container.begin
and container.end
at the start, and caches the two resulting iterators. It's equivalent to:
auto it = container.begin();
auto itend = container.end();
for (; it != itend; ++it) { ... }
for (auto it = container.begin(); it != conatiner.end(); ++it) { ... }
end
iterator could "know" that is was the end iterator, and each time operator!=
or operator==
was called on the iterators, we could query the container.)container.size()
at the start? This will simplify things, and fixed our bug.Of course, we've written quite a lot of code! What does this do?
for (auto i : enumerate(container)) { test_func(conatiner[i]); }
It calls test_func
on each entry of container
. With -O1
this compiles to this inner loop:
.L4:
movq (%rsi), %rax
movl (%rax,%rbx,4), %ecx
call _Z9test_funci
addq $1, %rbx
cmpq %rdi, %rbx
jne .L4
As we might hope, the compiler works its magic, and we pay no overhead for our mucking around.
I did some playing around with the idea of "generators" in C++ some time ago (you can't do this properly, except maybe with some fooling with lambda expressions, but the idea was to use range based for loops to process data being delivered by a class.)