Random sampling

This post, very tangentially, relates to a quiz we set job candidates. If you are applying for a job at my current company, and somehow work out I work there, and find this, then you probably half deserve a job anyway.

Suppose you have a population \( P \) and some test as to whether a member of the population is good or bad. We want to find a “random good member”. There are two methods that come to mind:

  • Random sampling: pick a member at random, test it, if it passes return it, otherwise try again.
  • Randomly order \(P\) and work through the whole set, returning the first good member.

The first method has the virtue of being simple. The second method uses a lot of memory, if \(P\) is large. But on closer thought, what if the proportion of “good” members is rather small. The 2nd method is guaranteed to find a good member in . How slow can the first method be?

Let \(B\subseteq P\) be the set of bad members. The first method fails to find a good member in \(n\) terms with probability \[ \left( \frac{\vert B\vert }{\vert P\vert} \right)^n \] (The chance of repeatedly choosing a bad member).

  • So, if half your members are bad, then you need just 7 goes to be 99% sure of finding a good member.
  • If only 1% of your population is good then you need 459 trials to be 99% sure of find a good member.

For the 2nd method, by “random ordering” I mean picking a member of the symmetric group of order at random and applying it to the set. We can do this in time proportional to . The algorithm is simple: choose an unpicked member of at random and take it as the next member of the random ordering, and then repeat. How long does it take to find a good member? The chance of choosing only bad members for the first goes is \[ \frac{\vert B \vert (\vert B\vert -1) (\vert B\vert-2) \cdots (\vert B\vert - n+1)} {\vert P \vert (\vert P\vert -1) (\vert P\vert-2) \cdots (\vert P\vert - n+1)} \]

  • So this will be quicker than method one, always.
  • But as becomes large, the limit is the same.

I’m not sure I’ve come to any conclusion. Method 1 is simple and fast if the good population is not too small. Method 2 needs some storage space, but is more predictable if is not too large and the proportion of good members is very small. If is very large and the proportion of good members very small, you probably need a better idea than simple sampling.

A more mathematical question presents itself. Suppose we do away with the good and bad members and simply ask:

Given a population and sampling at random (“with replacement”) what’s the expected number of samples I need to see 50% (or any fixed proportion) of the population.

Graphs in Windsor

While spending a long weekend in Windsor, we found the following:

Maze problem

The instructions read, in part:

The puzzle challenge is to solve this unique one-way maze by travelling from the entrance Pawn to the central Castle – always going forwards.

It’s not quite clear to me what this means, but there is a good quote from Here:

The maze should be played as if you are a runaway train - always moving smoothly through the points forwards, and never able to reverse. This maze-movement idea was inspired by the nearby Windsor Railway Station, which brought the Royal Family to Windsor from Victorian times onwards. The maze paths sometimes run together in pairs, like a pair of railway tracks; sometimes they even run through “railway cuttings”, where the grass surface rises and falls around them.

I wonder if there is an elegant way to convert this to a directed graph problem? I couldn’t see how to do this cleanly, as in the problem you must pass through a “node”, not reverse direction.

Anyway, here is a solution. From the pawn you initially get yourself into an outer “loop”, from which only the King is accessible. As we can only visit the King once, we don’t want to get back onto the outer loop in the same direction.

  • So, we take the loop which the King is on, and reverse direction. Two choices here.
  • We can then either go to the Bishop:
    • In we pass out by the “Raised Banks” then we’re back on the outer loop, which is no good.
    • So we must reverse direction again, and head in the direction of the Knight.
    • We can skip past the knight back onto the outer loop, but that’s wrong.
    • So visit the Knight.
    • Then take the branch down to the Queen, and then to the Castle. Task complete.
  • Or we loop back down towards the Pawn, and either:
    • Head towards the Knight, but we can’t visit the Knight, so we end up back up near the Bishop, and we’ve just taken a loop.
    • Take the earlier branch, and head up the Castle, but following the track, we’re forced to visit the Queen, and then head to the Castle. But we’ve missed pieces.

So, the only solution is King, Bishop, Knight, Queen, Castle. And once we’ve heading to the Knight, we only have a choice of which direction to pass through the Knight “station” in, everything else is determined. Before that we can loop about a bit if we wish.

RAII in C#/Java?

More a note to a future self than aything else… In C++ we have the notion of Resource Acquisition Is Initialization which I think I have internalised as the following:

Any “resource” (memory, file, handle etc.) should be “owned” by an object. An object’s constructor should allocate the resource, and the destructor should free the resource (with some appropriate choices then made for what copy/move constructors/assignments should do). By use of C++11 unique_ptr and shared_ptr this model can be extended to pointers.

Notice I didn’t mention “exception”, though one major additional use of RAII is exception safety: if an exception is thrown, then all objects currently in scope will have their destructors run, and so any resources will be safely released.

In C#/Java we have garbage collection. This takes care of memory, but not any other “resource”. In C# there is the IDisposable interface, and I was for a while confused by what this did. I am not alone, judging from SO:

For me, the following is a good way to think:

  • Imagine going back to programming in C. You allocate memory with malloc and then free it later with free. Similarly, if you open a file, you need to later close it.
  • A managed language (let’s say C#) removes the need to free anything. But it does nothing for the file open/close issue.
  • So we must manually close() a file, say.
  • How then to handle exceptions? Use the try ... catch ... finally idiom!
  • Then the IDisposable interface in C# should be thought of as enabling the syntactic sugar of the using command. Similarly the AutoCloseable interface in Java 7.

What of destructors? Don’t use them: they are only of use if you need to free “unmanaged resources”, and if you need to do that, you’ll know it.

So why the “IDisposable pattern”? This seems really to exist merely to support finalizers (aka destructors). However, if your class is not sealed, then maybe a derived class will need a destructor. The pattern then exists to ensure unmanaged resources are always freed, but from the destructor no managed resource is freed (as they might have already been freed by the garbage collector).

All in all, that seems about as much work as writing a destructor in C++!

BF interpreter in C++

BF is an amusingly named, impractical, Turing complete programming language. It is very like an actual Turing machine: you have an infinite (originally, one-sided, 30K long) array of (8-bit) values, with commands to increase/decrease (+/-) the cell, or move left/right (</>). Input (,) and output (.) are handled. Loops are given by [] and implement a “while cell is non-zero” loop. So a trivial mapping to C is possible (see the Wikipedia article).

I thought it would be fun to give an object oriented implementation, for which I used C++ for a change. My implementation decouples the parsing of the programme from running it. The parse class reads the input in a single pass, using a stack for dealing with nested loops. It translates the input into a list of commands, represented by an abstract base class with an execute method, overloaded for the various different commands. This is the “command” pattern (closely related to the “strategy” pattern).

I then have a separate class to represent the machine which can run the commands. This is again an abstract base class, which leaves input and output to be implemented in a derived class. The commands act on the machine (I think you could consider this an example of “dependency inversion principle”: both the commands and the machine act on abstractions, so there is loose coupling between them).

Currently I just have one concrete machine, which uses vectors to store input and output. As an example of the loose coupling, there are two parsers and associated command classes, which can work with the same machine. The first parser does no translation (and, internally, I use unique_ptrs a lot to avoid memory management). The second parser coalesces multiple commands (so, for example, “++++” becomes “+ times 4” internally) and uses RAII to handle memory management.

The parser and the machine need to communicate the list of instructions. This is mediated by an abstract base class which functions like a specialised vector. The concrete implementation actually uses a vector of (smart) pointers but one could also implement this as a raw interpreter with no parse stage, I guess. To avoid copying the list, I used shared_ptr aka reference counting. A subtle point here is that in the constructor, we shouldn’t have a naked pointer because we may throw an exception if the input is not well-formed. The unique_ptr release() method comes in handy here to pass off ownership to a shared_ptr.

On GitHub

Update: To be correct, in the 2nd case, we need to delete the copy constructor and assignment operators (as the object “owns” the allocated memory, a copy should not be allowed, unless we make a deep copy, which we have no need for.) I actually rather like the philosophy of “Role of Zero”: use automatic containers and unique_ptr or shared_ptr to manage all members. These then automatically provide copy/move operators, or disallow them (in the case of unique_ptr). (As ever, try this with g++, and you get an incomprehensible error message though!)

The article goes on to talk about how we can in fact avoid the need for virtual destructors if we consistently use shared_ptr, because a side-effect of storing a destructor is that shared_ptr<base> p = make_shared<derived>() means that when the reference count of p becomes 0, the destructor of derived is (correctly) called even if the destructor is not virtual. A bit of work with Google reveals the following standards suggestion: n3974.pdf which will allow something similar with make_unique.

CodeJam 2015 Round 2

Round 2 came and went (and Round 3 and the “distributed” round). As an effort to practice with C#, I solved these problems in C# (definitely slower than using Python, and actually, the end running time is not that much quicker, for these sort of algorithmic / numerical problems).

There’s a definite step-up in difficulty here! Maybe I kid myself I could have done problem A and some “small” problems quickly… that people did them all in 150 minutes is quite scary. The Official Contest Analysis is rather brief (from G+, it seems the organisers were busy with the distributed round; or maybe, if we’ve come this far, shorter answers are all we need ;-> ) I think my background in pure Maths shows here: I found problem D quite easy, as I found analysing the special cases off-line (pen and paper) to be not so hard. Problems B and C (in “large” form) were hard, I thought. (I missed the trick with problem B; and using an off-the-shelf Graph package is obviously the fast way to do Problem C.)

See the code on GitHub.

Problem A, Pegman: We have a grid and each cell can contain one of four directions (UP, DOWN, LEFT, RIGHT) or nothing. Pegman starts in a cell: if contains no direction he stays there. Otherwise he moves in the given direction, continuing in that direction until he exits the grid, or hits another direction symbol, in which case he changes direction. Can you alter (but not add or remove) some of the direction signs so Pegman never exits the grid, regardless of starting point?

This is quite easy, if you see the correct way to attack it. Pegman will exit the grid if there is a direction sign pointing directly out of the grid (e.g. a LEFT sign where all the cells to the left are empty: “.<” or “<” or “…<” etc.)) We must change all of these directions and we have to change them so that they don’t point out of the grid, i.e. they do point towards another direction sign. If we can do this, then there are no cells pointing out of the grid, and so Pegman can’t exit.

So check each direction: - Does it point out of the grid? If so, count it, and… - Check it it can be made to point at another direction. If not, report “IMPOSSIBLE”, other continue.

Problem B, Kiddie Pool: Given input pipes of flow rate \(R_i\) and temperature \(C_i\). You want to run these for an amount of time (different maybe for each \(i\)) to make volume \(V\) at temperature \(X\). If you can do this, what’s the least amount of time you need (the pipes run in parallel). So we want to find the smallest \(T>0\) so we can find \(0 \leq t_i \leq T\) with \( \sum_i t_i R_i = V\) and (a little maths shows) \( \sum_i t_i R_i C_i = VX. \)

  • If we only have one or two pipes (the “small” case) then some basic linear algebra, and a little care with floating point arithmetic, solves this.
  • How to deal with the “large” case, where we have up to 100 pipes?
  • This looks similar to a Linear Programming problem, but we don’t care about the \(t_i\)! We simply want to know what is the smallest \(T\) for which the problem is feasible. You can recast it in the following way: \[ \sum_i t_i R_i = V, \quad \sum_i t_i R_i C_i = VX, \quad t_i - T \leq 0 \ \text{for all }i. \] This is a standard linear programming problem, with minimisation target \(T\). The latest version of scipy.optimize contains an implementation of the simplex method to solve this.
  • However, even in the “small” case, the problem turns out to be horribly numerically unstable, and the scipy implementation has terrible problems. I ended up picking magic values to make it work, which is a silly way to be proceeding. I think this may be a problem with scipy, to be honest:
     fun: 1.6614342185826514
 message: 'Optimization terminated successfully.'
     nit: 5
 success: True
   slack: array([ 0.        ,  0.        ,  1.66143422,  1.66143422])
       x: array([ nan,  nan,  nan])
  status: 0

Maybe that’s unfair. Anyway, I couldn’t get it to work. - The simplex method is all about convex geometry, and we can solve this problem using this viewpoint. - The state space of the problem is a high-dimensional cube \({ (t_i) \subseteq \mathbb R^N : 0 \leq t_i \leq T }\). We then project this cube down onto the plane by mapping to \( \sum_i t_iR_i, \sum_i t_iR_iC_i). \) As this is a linear map, we end up with the convex set (in the upper right half-plane). - We then want to know if the point \((V,VX)\) is in the set. Or, rather, we project the unit cube (i.e. with \(T=1\) above) and then want the minimal scaling we need so that \((V,VX)\) is in our convex set. Equivalently, intersect the line from the original through \((V,VX)\) with the convex set and find if it only intersects at the origin (problem is impossible) or where the line first exits the set. - To specify a convex set, we can specify the extreme points. But our cube has \(2^N\) extreme points! However, we’ll make use of the fact that once we’ve projected onto the plane, very few of these points will remain extreme points. - The extreme points of the unit cube are where the coordinates are all 0 or 1. We can iteratively construct these by taking the set where only the first \(k\) coordinates are non-zero, and then adding new extreme points by adding on the vector \((0,\cdots,0,1,0,\cdots,0)\) where 1 is in the \((k+1)th\) position. - Again, by linearity, we can do this in the plane, as a 2D problem. So we have our extreme points, we add in this new direction each time, take the convex hull of the resulting set of points, and then run an algorithm to find the extreme points of this new convex set. What makes this work is that each step doesn’t introduce too many new extreme points. - I implemented the Graham Scan algorithm. See the IPython Notebook (well, now Jupyter notebook, I guess.) - Still some numerical instability issues, which I solved by converting everything to integers; this then overflows 64-bit integers, so use the BigInteger class in C#. - In the end, it works!

The official solution is much nicer! Let’s see how this works, in our notation. Choose \( t_i = T R_i’ / R_i \) where \( 0 \leq R_i’ \leq R_i\) the “notional flow”. Then we need \[ \sum_i t_i R_i = V \Leftrightarrow T \sum_i R_i’ = V \]\[ \sum_i t_i R_i C_i = VX \Leftrightarrow T \sum_i R_i’ C_i = VX = TX\sum_i R_i’ \Leftrightarrow \sum_i R_i’C_i = X\sum_i R_i’. \]So if we can maximise \(\sum_i R_i’\) subject to \(\sum_i R_i’(C_i-X) = 0\) then the time taken is \( T = V / \sum_i R_i’\).

The trick now is how to do this maximisation.

  • This is now, again, a linear programming problem. Again, I couldn’t get scipy.optimize.linprog to work! Ah, it’s a known bug!. Once patched with the GitHub version, off we go!

It’s also easy to attack this directly. Start with \(R_i’=R_i\) for all \(i\) and then consider trying to make the constraint \(\sum_i R_i’(C_i-X)=0\) hold.

  • If the sum is positive, we need to decrease it, and the optimal way to do this is clearly the decrease \(R_j’\) with \(C_j-X\) maximal.
  • Or if the sum negative, look at \(C_j-X\) minimal.
  • This will always work, but may lead to the zero solution, in which case the original problem was “IMPOSSIBLE”.
  • This works, but again I ran into numerical issues, so switched back to using integers. This algorithm does work fine with 64-bit integers.

You can use Lagrangian methods to prove that this is optimal. Here Wikipedia fails me, but there is a concise reference here: Richard Weber’s lecture notes, PDF. To put in standard form, lets \[ \text{Minimise } -\sum_i x_i \quad\text{subject to}\quad \sum_i x_iy_i=0, \quad 0 \leq x_i \leq x_i^{\max}. \]By replacing \(y=(y_i)\) with \(-y\), and reordering, we may suppose that \(\sum_i x_i^{\max}y_i \geq 0\) and \( y_1 \geq y_2 \geq \cdots \geq y_n. \)

I claim that the optimal is given by our algorithm: start with \(x_i=x_i^{\max}\), then decrease \(x_1\) to make the constraint hold, or to 0. Then use \(x_2\), and so on. This gives a solution of the form \( x^{(0)} = (0,\cdots,0,x_{i_0},x^{max}_{i_0+1},\cdots)\) with \(i_0\) minimal.

To prove this, consider the Lagrangian, \[ L(x,\lambda) = -\sum_ix_i - \lambda \sum_i x_iy_i = \sum_i x_i(-1 - \lambda y_i). \]Choose \( \lambda_0 \) with \( -1-\lambda_0y_{i_0} = 0 \) that is \( \lambda_0 = -1/y_{i_0} \). Then \( L(x,\lambda_0) = \sum_i x_i(y_i/y_{i_0} - 1))\) obtains its minimum for \(x=(x_i)\) given by \[ x_i = \begin{cases} 0 &: y_i/y_{i_0} - 1 > 0 \Leftrightarrow y_i > y_{i_0} \
x_i^{\max} &: y_i/y_{i_0} - 1 <0 \Leftrightarrow y_i < y_{i_0} \
\text{arbitrary} &: y_i = y_{i_0}. \end{cases}\] Notice that \(x_0\) satisfies these conditions, and so \(L(x_0,\lambda_0) \leq L(x,\lambda_0) \) for all allowed \(x\) and hence by the Lagrangian Sufficiency Theorem, \(x_0\) is an optimal solution.

Problem C, Bilingual: Given N lines of text each containing “words”. We’re told that the first line is in “English” and the second is in “French”. The remainder are either English or French. Our task is to decide how to assign the remaining lines such that we minimise the number of words which occur in both languages. (Scare quotes as the “words” of course don’t have to be real English or French words).

Turn this into a (bipartite) graph: on the left are vertices for each word, on the right are vertices for each line, and we have an edge from a word to a line if that word occurs in that line. We want to “colour” the lines English or French, with the constraint that we already know the first two lines are E and F, respectively. We want to maximise the number of words which are linked to only one of E or F.

  • Suppose we have coloured all the lines, and that we then delete any word which is linked to both E and F. Then the 1st and 2nd lines are not connected in the resulting graph.
  • Conversely, if we start with our graph and delete words until the 1st and 2nd lines are not connected, then we can consistently colour any line connected to the 1st line as E, and any line connected to the 2nd line as F.
  • So our task is to find the minimum number of words to disconnect the 1st and 2nd lines.
  • It may be that there are some words and lines in a different component to the 1st and 2nd lines. If this occurs, then we can consistently assign either E or F to all of these words (and lines). Thus we can actually just ignore such words. Similarly, if the 1st and 2nd lines are already not connected, then there are a minimum of 0 words which occur in both languages.
  • We don’t care about the actual assignments, so to simplify things, we can delete the lines from our graph: we do this by forming a new graph with vertices the words, and an edge linking two words if there is one line which contains both words.
  • We then want to find the minimum number of words to remove to disconnect the cliques formed from the 1st and 2nd lines. In a related post I describe an algorithm (based on the Ford–Fulkerson algorithm) which can find the minimum number of vertices to remove to disconnect two vertices.
  • Ahead of time, we don’t know which vertices from the 1st two lines should definitely be “English” and “French”. We get around this by adding two new vertices: one is the “English” vertex, and is linked to all the words in line 1, and the other is the “French” vertex, linked to all words in line 2. Then we wish to disconnected these two new vertices.

My implementation in C# is still embarrassingly slow…

Problem D, Drum Decorator: We have a drum (so a grid of R rows and C columns with the far left column considered next to the far right column). You want to fill the boxes with numbers 1, 2, 3, 4, … such that each box with K in is next to exactly K boxes of the same type. Here “next to” means directly up, down, left, right.

  • Consider a box with 4 in. Then all four neighbours must also contain a 4, in particularly, directly up is 4. By induction, all the boxes above contain 4, but then we run into a problem at the top of the drum. So we can’t use 4.
  • Consider a connected region of boxes with 3 in. Consider the top most row of this region, so we have

     xxx
      3
    

    Where “x” means the top of the drum, or other non-3 numbers. Then the “3” we have is surrounded by exactly 3 other boxes, so these must all contain 3 as well. Inductively moving left and right, we find that the entire row must consist of 3, and the row below must also consist of 3s

     xxxxxx
     333333
     333333
    

    But now the row below cannot contain any 3s. Conclude: 3s can only occur in bands of height 2, and above and below must be other numbers.

  • Now consider 2. Such a box must have exactly 2 neighbours containing 2 as well, and this means we end up with a “path” of 2s. For example, we can have

     xxxxxxx
     2222222
     xxxxxxx
    

    Again, “x” means the top/bottom of the drum, or non-2 boxes.

  • A 1 must neighbour exactly one other 1. Suppose in a row we have “11”, so to the left and right must be 2s, i.e. we get “2112”
  • Above could be the top, or 3s. As the 2s must have 2 neighbours, we must have

     xxxxxxxx   Then below the 1s =>   xxxxxxxx
      221122                            221122
       2  2                              2222
    

    Then the bottom 2s are happy, so we can fill in with 1s

     xxxxxxxx
     ?221122?
     ?122221?
    

    Note that we can just repeat this forever. Furthermore, this is forced on us, as you can verify we’re forced into expanding the pattern thus:

     xxxxxxxx     xxxxxxxxxx     xxxxxxxxxxxx
     22211222 ==> 2222112222 ==> 122221122221 And so on
     11222211     2112222112     221122221122
    

    Conclude: If the number of columns is a multiple of 6, we can fill 2 rows with this pattern; above and below must be 3s.

  • Below could be the bottom, or 3s. This is just the flipped version of the above, and notice that these give the same pattern, up to rotation.
  • Now consider the case of 1 stacked vertically above another 1, again in the top row (with 3s, or the top of the drum, above). Filling in 2s around we start with:

     212
     212
      x
    

    The x is either a 2 or a 3 (and if the latter, we have a row of 3s of course). Then from the top row we must have further 2s:

     22122
      212
    

    If the x is a 3 then we must also have

     22122
     22122
    

    and then we find the pattern:

     221221221...
     221221221...
    

    which can be repeated forever, so long as the number of rows is a multiple of 3.

  • If not then the x is a 2, and at least one box to the left or right must also be a 2, say to the left, so we get

      22122
      1212
      122
    

    We now see a possible pattern:

      22122212..
      12121212..
      12221222..
    

    Conclude: If the number of columns is a multiple of 4, then we can get a section of height 3 rows.

  • If not, then we must have

      22122      222122     1222122     21222122
      1212  ==>  21212  ==> 121212  ==> 2121212
      1221       21221       21221       221221
                  212         212         1212
    

    Which (eventually!) is a contradiction as the bottom left 2 cannot have 2 neighbours which are also 2.

  • We have now looked at all possibilities for what the “top” row can be if it consists of 1s and 2s:
    • A row of 2s
    • Height of 2 rows, a repeating pattern of width 3 (call this Type 1)
    • Height of 2 rows, a repeating pattern of width 6 (call this Type 2)
    • Height of 3 rows, a repeating pattern of width 4 (call this Type 3)
  • We now need to count possibilities. We must interleave rows of 3s with rows of 2+1s, the options for the latter varying with the number of columns we have (whether divisible by 3, 4, 6 or 12). Overall we are free to ignore rotations, but if two “blocks” can have offset rotations, then we need to count these as separate.
  • For example, if we have 3 columns and 6 rows, then we can start with a block of 2s and 1s, then have a block of 3s, then a second block of 2s and 1s. There are then 3 choices for the rotation of the 2nd block of 2s and 1s.
  • A (slightly complex) dynamic programming approach can find the number of possible ways to cover the drum using \( a_i \) blocks of type \( i\in {1,2,3} \). For each possibility, we then need to multiply by the total number of distinct rotations. For a blocks of type 1 we have 3**(a-1) choices for rotation (the first one can be fixed to be not rotated, and then we have a free choice for the others). Similarly for types 2 and 3. Finally we have to decide on the number of ways to rotate all the type 1s against all the type 2s etc.
  • This is equivalent to the general problem of finding all the combinations \( (a_i \mod N_i)_{i=1}^k \) where we treat \( (a_i \mod N_i)_{i=1}^k \) and \( (a_i + j \mod N_i)_{i=1}^k \) as equivalent for all \(j\). Now, \( a_i +j \equiv a_i \mod N_i \) for all \(i\) if and only if \( N_i | j \) for all \(i\), if and only if \( L = \operatorname{lcm}(N_i) \) divides \(j\). So we get a free action of \(\mathbb Z_L \) and so by a cute application of the Orbit-Stabilizer theorem we count the equivalence classes as \( \prod_i N_i / L \).
  • In our case, we have either (3,4,6) -> 12 or (3,4) -> 1 or (3,6) -> 3 or (4,6) -> 2.
  • Put all this together and magically it works!