Blog of Matthew Daws

Expectations of brilliance underlie gender distributions across academic disciplines

I blogged previously about statistical programming in Python. Here I want to say something about the data I used, which is from the paper:

Sarah-Jane Leslie, Andrei Cimpian, Meredith Meyer, Edward Freeland "Expectations of brilliance underlie gender distributions across academic disciplines" Science 347 (2015) 262--265. DOI: 10.1126/science.1261375

The abstract explains the results of the survey and data analysis the author perform:

Read More →

Probabilistic programming in Python

Later in the week I will give a talk to the Centre for Spatial Analysis & Policy group in Geography, at Leeds Uni. See the GitHub Repo for details.

I had a few aims:

Read More →

PyMC3

I'm finally doing some work which requires some genuine Bayesian analysis, and so have returned to playing with emcee. I've also been looking at PyMC3 which is an impressive piece of work, but also requires a bit of change of thinking from emcee.

Some notebooks can be found on GitHub.

Read More →

Code style, testing, etc.

I've written some stuff for the University of Leeds Python Discussion Group on Code style, tooling, testing, all that Jazz.

I'll try and do some live coding, and get some debate going tomorrow. But the above can be read as a bit of blog post as well.

Read More →

More formal working

I am a big fan of Jupyter notebooks and similar (e.g. R Markdown) systems which allow you to mix code and documentation, preferably in a browser (which allows sharing).

However, I've found that it's quite easy to fall into a "hacking" work pattern of developing quite a lot of code, and mixing it up with substantial data processing. This leads to a number of anti-patterns:

  • The code begins to completely dominate, vs the documentation, or overview, big picture view.
  • I fall into the habit of restarting the notebook, wasting time on reloading data, and then making small changes to an analysis.
  • Constant minor editing and then "shift-return"ing through a load of cells.
Read More →

TileMapBase

I've published my first python package on PyPi (See also the New PyPi which seems to have finally synced.)

Get it here: TileMapBase or TileMapBase on new PyPi:

Uses OpenStreetMap tiles, or other tile servers, to produce "basemaps" for use with matplotlib. Uses a SQLite database to cache the tiles, so you can experiment with map production without re-downloading the same tiles. Supports Open Data tiles from the UK Ordnance Survey.

My original aim was to produce a simple, high-level way to use OpenStreetMap style tiles as a "basemap" with MatPlotLib in Jupyter Python notebooks. Since then, I've also been working on TileWindow which uses this library to cache tiles, and provides a tkinter widget which displays a map-- sort of like GoogleMaps but in Python. Ultimately for use in my current job: PredictCode.

Read More →

PyPi and use of ReStructuredText

I've in the process of putting together my first proper Python package to be uploaded to PyPi / PyPi Old. The docs around doing this are not great, but the official docs are pretty good:

Read More →

On memory management

I have only ever been a hobbyist C++ programmer, while I have been paid to write Java and Python. But a common complaint I've read about C++ is that you have to manage memory manually, and worry about it. Now, I'd slightly dispute this with C++11, but perhaps I don't really have enough experience to comment.

However, I think there's a strong case that with Garbage Collected languages, you can't really forget about memory, or the difference between copy by reference and copy, but the language rather allows you to pretend that you can cease to worry. In my experience, this is only true 99% of the time, and the 1% of time it bites you, you've quite forgotten that it's a possibility, which makes debugging a real pain (the classic "unknown unknown").

Read More →
Profile image; rendered glass discs
Categories
Recent posts