# Blog of Matthew Daws

## New site

I have refreshed my website, now building it as a purely static site (instead of using Jekyll) built on top of Bootstrap. To keep the blog going, I have quickly written a Python script which re-creates what I need of Jekyll. Seems to be working, which is quite pleasing.

## PDF munging with LaTeX

An aide-memoire for myself:

\documentclass[a4paper]{article}
\usepackage{graphicx,forloop}

\begin{document}
\pagestyle{empty}

\newcounter{pdfpagenumber}
\forloop{pdfpagenumber}{1}{\value{pdfpagenumber} < 115}{
\raisebox{-225ex}[0ex][0ex]{\makebox[90ex]{\includegraphics[width=12in,page=\arabic{pdfpagenumber}]{mtms.pdf}}}
\newpage
}

\end{document}


## OneDrive for Business, or pleasure.

My new job came with a surprise: I get a Surface Pro with docking station as my work PC. This is actually very nice (I tend normally towards the "good enough" school of technology ownership). An Office365 subscription also comes with the job, and so 1TB (yes, a few years ago, a good hard-disk) of cloud storage from OneDrive for business.

Hmm, but... The Surface Pro only have GBs of free storage (thanks to a smallish SSD) and that's to be shared with applications I might want to install. But, surely, I can just sync the folders I want, and keep more in the cloud (swapping things about, perhaps, if needs be). Right? A bit of Internet searching suggests that, sure, that's an option. For normal consumer OneDrive. But not, it seems, for OneDrive Business. Until maybe mid-2018 when a new client comes out. YMMV of course.

## Expectations of brilliance underlie gender distributions across academic disciplines

I blogged previously about statistical programming in Python. Here I want to say something about the data I used, which is from the paper:

Sarah-Jane Leslie, Andrei Cimpian, Meredith Meyer, Edward Freeland "Expectations of brilliance underlie gender distributions across academic disciplines" Science 347 (2015) 262--265. DOI: 10.1126/science.1261375

The abstract explains the results of the survey and data analysis the author perform:

## Probabilistic programming in Python

Later in the week I will give a talk to the Centre for Spatial Analysis & Policy group in Geography, at Leeds Uni. See the GitHub Repo for details.

## PyMC3

I'm finally doing some work which requires some genuine Bayesian analysis, and so have returned to playing with emcee. I've also been looking at PyMC3 which is an impressive piece of work, but also requires a bit of change of thinking from emcee.

## Code style, testing, etc.

I've written some stuff for the University of Leeds Python Discussion Group on Code style, tooling, testing, all that Jazz.

I'll try and do some live coding, and get some debate going tomorrow. But the above can be read as a bit of blog post as well.

## More formal working

I am a big fan of Jupyter notebooks and similar (e.g. R Markdown) systems which allow you to mix code and documentation, preferably in a browser (which allows sharing).

However, I've found that it's quite easy to fall into a "hacking" work pattern of developing quite a lot of code, and mixing it up with substantial data processing. This leads to a number of anti-patterns:

• The code begins to completely dominate, vs the documentation, or overview, big picture view.
• I fall into the habit of restarting the notebook, wasting time on reloading data, and then making small changes to an analysis.
• Constant minor editing and then "shift-return"ing through a load of cells.