Working with numpy again

Posted on 23th March 2017


In my new job, I find myself working with numpy (after a break of a couple of years, and now professionally, and not as a hobby.) Numpy is great, but it doesn't half require a little thinking upon occasion.

Suppose we have an array of 10 points in the plane. Should this be represented as a numpy array of shape (2,10) or (10,2)?

Argument from kernels

Suppose I want to write a function which represents a two dimensional kernel, for example:

def ker(x,y):
   return x + 2 * y

This doesn't vectorise at all: I cannot do

p = np.array([1,2])
ker(p)

So instead we might write:

def ker(p):
   return p[0] + 2 * p[1]

Then, to vectorise across my array of 10 points, evaluated the kernel at each point, I need my array to have shape (2,10).

This is exactly the convention chosen by the kernel object which Scipy stats Gaussian KDE will produce, for example.

Argument from broadcasting

Given my 10 points, I should be able to translate them all by the same amount, by just adding a point.

my_array + point

For this to work, I need to work with the Numpy broadcasting rules which imply shape (10,2):

my_array = np.random.random(20).reshape((10,2))
point = np.array([2,3])
my_array + point

So if you're working with things your thinking of as spatial "vectors", the other convention seems most natural. This is exactly the convention chosen by e.g. Scipy Spatial KDTree.

What to do?

I guess you pick a convention, stick with it, and translate (e.g. using .T) as necessary.

A further hint is to read the broadcasting rules very closely and to know that indexing with None is the same as adding a new axis:

my_array = np.random.random(20).reshape((10,2))
new_array = my_array[:,None,:]
print(new_array.shape) # (10,1,2)

Categories
Recent posts