Blog of Matthew Daws

BF interpreter in C++

BF is an amusingly named, impractical, Turing complete programming language. It is very like an actual Turing machine: you have an infinite (originally, one-sided, 30K long) array of (8-bit) values, with commands to increase/decrease (+/-) the cell, or move left/right (). Input (,) and output (.) are handled. Loops are given by [] and implement a "while cell is non-zero" loop. So a trivial mapping to C is possible (see the Wikipedia article).

I thought it would be fun to give an object oriented implementation, for which I used C++ for a change. My implementation decouples the parsing of the programme from running it. The parse class reads the input in a single pass, using a stack for dealing with nested loops. It translates the input into a list of commands, represented by an abstract base class with an execute method, overloaded for the various different commands. This is the "command" pattern (closely related to the "strategy" pattern).

Read More →

CodeJam 2015 Round 2

Round 2 came and went (and Round 3 and the "distributed" round). As an effort to practice with C#, I solved these problems in C# (definitely slower than using Python, and actually, the end running time is not that much quicker, for these sort of algorithmic / numerical problems).

There's a definite step-up in difficulty here! Maybe I kid myself I could have done problem A and some "small" problems quickly... that people did them all in 150 minutes is quite scary. The Official Contest Analysis is rather brief (from G+, it seems the organisers were busy with the distributed round; or maybe, if we've come this far, shorter answers are all we need ;-> ) I think my background in pure Maths shows here: I found problem D quite easy, as I found analysing the special cases off-line (pen and paper) to be not so hard. Problems B and C (in "large" form) were hard, I thought. (I missed the trick with problem B; and using an off-the-shelf Graph package is obviously the fast way to do Problem C.)

See the code on GitHub.

Read More →

Thoughts about C#

On the basis that knowing a modern OO language would be good, and on the basis that everyone and their granny knows JAVA, I thought I'd look at C#. Getting going with C# was a bit tedious: I end up looking things up endlessly, and puzzling over things for a while before they "click" and seem obvious (which is always annoying, as then I'm left wondering why it wasn't obvious from the start). You wait for muscle memory to build...

Example: A small mistake lead to some musings on how sort algorithms are implemented: StackOverFlow Question (the actual problem came from thinking about orderings in Python and C++ too much, where you specify a strict weak ordering, basically "less than", and then porting that to a CompareTo method and missing a corner-case. This came up in a CodeJam problem.)

Read More →

Disjoint Paths; Implementation Issues

Mostly as an excuse to learn C# I implemented the path finding algorithms. Running some tests (random graphs, and then computing cuts from found disjoint paths, as to verify a "cut" is easy!) caught some implementation issues. All obvious with hindsight.

Notes on the implementation

See the file on GitHub. I wanted to use interfaces, and extension methods, as they are features of C# which differ from e.g. C++. In hindsight, I think an abstract base case Graph would have been more sensible, with DirectedGraph and UndirectedGraph inheriting from this.

Read More →

Disjoint Paths

A few notes on paths in graphs: finding the maximum number of edge/vertex disjoint paths, and an application to finding disconnections.

Edge disjoint paths

Consider a graph, for the moment undirected. We wish to find the maximum number of edge disjoint paths from a fixed vertex \(s\) to a fixed vertex \(t\). We can do this with the Max-Flow Min-Cut theorem and the Ford-Fulkerson algorithm, at least in the directed case, but let's specialise this to our case.

Read More →

AdaBoost and classification

Motivated by another nice post from Jeremy Kun I've had a play with AdaBoost. Like Jermey I used the "Adult Census Dataset" and then also the "Titanic" dataset. The latter requires some data munging, and some decisions about how to process the data.

Once you understand what AdaBoost does, and how Decision Stumps/Trees work, it seems reasonable to now use scikit-learn (which can often make you seem like a trained monkey, plugging things into your data). It's worth noting that an out the box decision tree works almost perfectly on the "Adult" dataset, but much less well on the Titanic Dataset (where AdaBoost and decision stumps now don't look so bad!)

As ever, Python feels a bit slow for this sort of thing, but also nice because it's interactive, and well aimed at handling data.

GitHub Repository

Read More →

Ising model and image denoising

As a final application of MCMC methods, you can't escape the Ising Model, and it's cute application to denoising images (of a text-like nature).

Denoising

Ipython notebook

Read More →

Removing outliers

So, linear regression isn't so exciting. What about removing outliers? I do find this, just a little, magical:

Outliers

We define a model where a data point \( y_i \) can either be from a normal linear regression model, or just a random (normally distributed) number. We then build into our model indicators \( o_i \in \{0,1\} \) to indicate if this is an outlier or not. Then sample the posterior distribution with an MCMC sampler, integrate out the \(o_i\) and you have your line of best fit. Fix \(i\) and integerate out everything but \(o_i\) and you get an estimate of probability the point is an outlier. The above plot shows those points estimated above 90%.

Rest is in an Ipython notebook on GitHub. This also gave me a chance to play with my own modified Gibbs sampler, which works nicely, if I say so myself.

Read More →
Profile image; rendered glass discs
Categories
Recent posts