On memory management

Posted on 5th July 2017


I have only ever been a hobbyist C++ programmer, while I have been paid to write Java and Python. But a common complaint I've read about C++ is that you have to manage memory manually, and worry about it. Now, I'd slightly dispute this with C++11, but perhaps I don't really have enough experience to comment.

However, I think there's a strong case that with Garbage Collected languages, you can't really forget about memory, or the difference between copy by reference and copy, but the language rather allows you to pretend that you can cease to worry. In my experience, this is only true 99% of the time, and the 1% of time it bites you, you've quite forgotten that it's a possibility, which makes debugging a real pain (the classic "unknown unknown").

A stupid example which wasted some of my time today is:

import numpy as np
...
indexes = np.argsort(times)
coords[0] = coords[0][indexes]
coords[1] = coords[1][indexes]

With hindsight this obviously mutates the data underpinning coords and hence mutates anything which is an alias of coords. Cue two tests failing, and the first one was silently mutating the data the second test tried to use. But this is really hard to spot-- both tests failed, so I spend a while looking at the base class because that's the only common code involved. Unit testing doesn't really help, as I'd never think to test that I'm not accidentally mutating some data reference (because I'd never be that stupid, right...)

What I meant to do was:

coords = coords[:,indexes]

This generates a new array instance and assigns the reference to coords. But this is quite subtle. To even express it, I have to use language which I learnt from C/C++. I only finally noticed when I wrote some test code in a notebook, and noticed that there was some period 2 behaviour going on. "Oh, I must be mutating something... Oh, right..."

The problem with Python, and Java, is that you get out of the habit of even thinking in this way. I used to write a lot of immutable code in Java, precisely to avoid such problems. That seems to make massive sense in a corporate environment. But for numpy, and trying to squeeze performance out of an interpretted language, you sometimes need mutability. Which means you need to think. (And regularly makes me wish I could just use C++, but that's nothing story...)


Categories
Recent posts