Enumerate in C++

Posted on 14th May 2015


One of the fun things about C++11 is that minor changes allow you to write somewhat more expressive (aka Pythonic) code.

A common pattern is to iterate over an array while also knowing the index you are at. Python does this with the enumerate keyword:

for index, x in enumerate(container):
   print("At {} is {}".format(index, x))

In C++ the idiomatic way to do this is perhaps:

std::vector<int> container;
...
for (int i = 0; i < container.size(); ++i) {
   std::cout << "At " << i << " is " << container[i] << "\n";
}

This has its flaws though (and is tedious to type). Using g++, if you have warnings on, -Wall, then you'll see complaint, as the type of container.size() is, typically, of size std::size_t which on my machine is unsigned long long (aka an unsigned 64-bit number). Warnings aside, as int is on my machine a 32-bit unsigned number, we could potentially have problems if the size of container is more than around 2 billion (unlikely, but possible these days).

We could write:

for (std::size_t i = 0; i < container.size(); ++i) {

or more tediously, but more showy:

for (decltype(container.size()) i = 0; i < container.size(); ++i) {

Surely we can do better? So I sat down and wrote a little template function enumerate(T container) which returns a class implementing the methods begin() and end(), these in turn returning custom iterators which count up from 0 to container.size() - 1. You can thus write:

foreach(auto index : enumerate(container)) {
   std::cout << "At " << index << " is " << container[index] << "\n";
}

Subtle issues 1

What makes C++ interesting, and frustrating, is certain subtle issues. Our iterator returns a number, for which we implement operator*:

	SizeType operator*() { return index; }

That's great, but what if the user tries to capture the index by reference:

foreach (auto& i : eneumerate(container))

We get the cryptic error "error: invalid initialization of non-const reference of type 'long long unsigned int&' from an rvalue of type 'long long unsigned int'" (Explanation from StackOverflow. Basically, when we return index; we are not returning index, but rather a temporary copy (aka r-value) and, as you might hope, you're not allowed to take the reference of a temporary copy...) We can "fix" this by:

	SizeType& operator*() { return index; }

However, we've now giving the user access to the index, so this

foreach (auto& i : eneumerate(container)) {
   std::cout << container[i] << "\n"; ++i; }

prints every second term of container. Is that what we want? If not, then we can use a const:

	const SizeType& operator*() { return index; }

The command ++i; now gives an error "error: increment of read-only reference 'i'" which is, to my mind, (a) not cryptic; and (b) makes it clear to the user what the issue is.

Subtle issues 2

Let's get ambitious, and use enumerate for something else:

struct range {
   int size()const { return 8; }
};
...
for (auto n : enumerate(range{}))) {
   std::cout << n << " ";
}

This seems to work, so try something more ambitious:

class range {
private:
   int limit;
public:
   range(const int l) : limit{l} {}
   int size()const { return limit; }
};
...
for (auto n : enumerate(range(8)))) {
   std::cout << n << " ";
}

This again seems to work, but it may not... Suppose we add:

   ~range() { std::cout << "~range() called." << std::endl }

Then the output is:

range::~range() called.
0 1 2 3 4 5 6 7

Erm... Well, a range object is created with initial value 8, this is passed to the enumerate function which takes a const reference of the range object which is passed to a class which implements begin() and end(); the class also holds a reference. Then the range object ends its life, and the destructor is called. This occurs before the for loop, and with more checking, actually before the call to end which uses the size method. Why? Because range(8) creates an object, whose lifespan is just during the call to the function enumerate. That it works at all is a fluke...

To fix this, let's examine some design assumptions:

  • All we need to know about our "container" is its size.
  • If you explore things, then the for (type i : container) expression calls both container.begin and container.end at the start, and caches the two resulting iterators. It's equivalent to:
    auto it = container.begin();
    auto itend = container.end();
    for (; it != itend; ++it) { ... }
    
    and not, as perhaps you might think, to:
    for (auto it = container.begin(); it != conatiner.end(); ++it) { ... }
    
  • Thus, even if we wanted to we cannot support the container length changing in the loop (not quite true: the end iterator could "know" that is was the end iterator, and each time operator!= or operator== was called on the iterators, we could query the container.)
  • So why not just read container.size() at the start? This will simplify things, and fixed our bug.

Complexity

Of course, we've written quite a lot of code! What does this do?

for (auto i : enumerate(container)) { test_func(conatiner[i]); }

It calls test_func on each entry of container. With -O1 this compiles to this inner loop:

.L4:
	movq	(%rsi), %rax
	movl	(%rax,%rbx,4), %ecx
	call	_Z9test_funci
	addq	$1, %rbx
	cmpq	%rdi, %rbx
	jne	.L4

As we might hope, the compiler works its magic, and we pay no overhead for our mucking around.

Links

I did some playing around with the idea of "generators" in C++ some time ago (you can't do this properly, except maybe with some fooling with lambda expressions, but the idea was to use range based for loops to process data being delivered by a class.)


Categories
Recent posts