AdaBoost and classification

Posted on 2nd June 2015


Motivated by another nice post from Jeremy Kun I've had a play with AdaBoost. Like Jermey I used the "Adult Census Dataset" and then also the "Titanic" dataset. The latter requires some data munging, and some decisions about how to process the data.

Once you understand what AdaBoost does, and how Decision Stumps/Trees work, it seems reasonable to now use scikit-learn (which can often make you seem like a trained monkey, plugging things into your data). It's worth noting that an out the box decision tree works almost perfectly on the "Adult" dataset, but much less well on the Titanic Dataset (where AdaBoost and decision stumps now don't look so bad!)

As ever, Python feels a bit slow for this sort of thing, but also nice because it's interactive, and well aimed at handling data.

GitHub Repository


Categories
Recent posts