PDF munging with LaTeX

An aide-memoire for myself:



    \forloop{pdfpagenumber}{1}{\value{pdfpagenumber} < 115}{


OneDrive for Business, or pleasure.

My new job came with a surprise: I get a Surface Pro with docking station as my work PC. This is actually very nice (I tend normally towards the “good enough” school of technology ownership). An Office365 subscription also comes with the job, and so 1TB (yes, a few years ago, a good hard-disk) of cloud storage from OneDrive for business.

Hmm, but… The Surface Pro only have GBs of free storage (thanks to a smallish SSD) and that’s to be shared with applications I might want to install. But, surely, I can just sync the folders I want, and keep more in the cloud (swapping things about, perhaps, if needs be). Right? A bit of Internet searching suggests that, sure, that’s an option. For normal consumer OneDrive. But not, it seems, for OneDrive Business. Until maybe mid-2018 when a new client comes out. YMMV of course.

I did find some instructions and downloads which can turn on hidden features in the consumer version of OneDrive (which you’ll have automatically, if you are on Windows 10). Am I going to run random registry hacks on my new work PC? No… But it turns out the edits you need to mark are small, and easily reversible, and safe seeming. So here’s a guide.

The default view

If you fire up OneDrive, it’ll look a bit like this:


But you won’t be able to input a work email.

Edit the registry

Hit the windows key, and type regedit and enter to tune the “Registry Editor”. You’ll need some form of admin rights, I’m afraid. In the tree view on the left, navigate to, sucessively, “HKEY_CURRENT_USER” then “Software” then “Microsoft” and then “OneDrive”. Now:

  • Right-click in the right-hand window, select “new” and then “DWORD (32-bit) value”.
  • Type the new name as “EnableAddAccounts”.
  • Double-click on the new entry, and change the “Value data” to 1.
  • Do exactly the same with a new entry with name “DefaultToBusinessFRE” (and value 1).

Close the registry editor

The new view

Initially, I this didn’t work me. I quit, and restarted, OneDrive, but no luck.

  • You could try rebooting your machine. (I didn’t, thanks to some long-running computational tasks I’m performing…)
  • However, if you visit the OneDrive Download site, and click on “Start OneDrive”, a new view of OneDrive will open, where I was able to type my work email address, and then was taken to my work’s login page. After that, I can selectively sync the folders I want.

Expectations of brilliance underlie gender distributions across academic disciplines

I blogged previously about statistical programming in Python. Here I want to say something about the data I used, which is from the paper:

Sarah-Jane Leslie, Andrei Cimpian, Meredith Meyer, Edward Freeland “Expectations of brilliance underlie gender distributions across academic disciplines” Science 347 (2015) 262–265. DOI: 10.1126/science.1261375

The abstract explains the results of the survey and data analysis the author perform:

The gender imbalance in STEM subjects dominates current debates about women’s underrepresentation in academia. However, women are well represented at the Ph.D. level in some sciences and poorly represented in some humanities (e.g., in 2011, 54% of U.S. Ph.D.’s in molecular biology were women versus only 31% in philosophy). We hypothesize that, across the academic spectrum, women are underrepresented in fields whose practitioners believe that raw, innate talent is the main requirement for success, because women are stereotyped as not possessing such talent. This hypothesis extends to African Americans’ underrepresentation as well, as this group is subject to similar stereotypes. Results from a nationwide survey of academics support our hypothesis (termed the field-specific ability beliefs hypothesis) over three competing hypotheses.

I came across this paper from an excellent AMS blog post especially section 4.

This lead to some meditating on the current state of universities. You can’t help but notice that here I’ve linked to The Guardian and The Times Higher, two publications which spend a lot of time collating and publicising university rankings. The modern university seems to have fixation on ranking and measuring, often in a deeply impersonal way. We spend a lot of time worrying about prizes: who won the Nobel? Who won a Fields? I wonder what effect an obsession with performance, “are you good enough?”, has on the participation of minorities, given the above research findings.

Mathematics, my past and future academic pursuit, is a field which sometimes feels uniquely obsession with questions of achievement. It of course comes off “worst” in the analysis (has the highest belief in a “culture of brilliance”, although is beaten here by Philosophy, and has a higher female PhD participation rate than Engineering, or Physics, or Computer Science). We spend much time talking about the Fields medal; my professional society journal is always after nominations for prizes. Why don’t we celebrate more quotidian contributions, and our (presumed) collective love of the subject more?

I resolve to try to use words like “genius” and “brilliant” less, especially in front of students. I will try to stop saying things like “this result is too hard to prove”. It is better to say that we don’t have the time, or the machinery, to give a proof. Better, still, to give some indication why a result is true, or interesting.

I have come late to this essay: (The Lesson of Grace in Teaching)[http://mathyawp.blogspot.co.uk/2013/01/the-lesson-of-grace-in-teaching.html]. I found the following a shocking idea:

Your accomplishments are NOT what make you a worthy human being.

And then I find it shocking that I am shocked by this. But this is, sometimes, what academia does to you. For more about Prof Francis Su, see an interview with Quanta.

Probabilistic programming in Python

Later in the week I will give a talk to the Centre for Spatial Analysis & Policy group in Geography, at Leeds Uni. See the GitHub Repo for details.

I had a few aims:

  • Show that it’s very possible to perform standard statistical analysis in Python, using tools like pandas and statsmodels.
    • I prefer Python over R because, well, I know Python well, and I don’t know much R. But Python is a multipurpose programming language, and I like the flexibility to work in a “notebook” style, using tools like pandas, but also the ability to develop (and test, etc.) modules and packages.
    • I am also extremely sympathetic to the argument made by e.g. Richard McElreath in his book Statistical Rethinking:

      That is the reason that this book insists on working with the computational nuts and bolts… This requires knowing the statistical model in greater detail than is customary, and it requires doing the computations the hard way, at least until you are wise enough to use the push-button solutions.

      (See page 4). That is, maybe being forced to think about how to perform statistical analysis in a slightly more verbose environment is no bad thing.

    • I don’t want to get into an R vs Python argument; just to point out that you can use Python.
  • Another big aim was to evangelise about Bayesian methods. Or at least being explicit about statistical models, parameter fitting, and, if you must, what hypotheses you are actually testing. To my, self-taught, mind, a Bayesian approach is rather natural.
  • Finally, to have some discussion of Probabilistic Programming, here using pymc3. That is, the way we code with pymc3 is such that certain variables are actually random variables from which we typically later take samples from, using MCMC techniques.

I would have liked to have time to discuss seaborn but, time is finite.

For a dataset, I extracted the main data from the article Expectations of brilliance underlie gender distributions across academic disciplines. I want to blog more about this later. I had hoped to get a discussion going with the audience (as I am not a statistician by training or trade) but unfortunately turnout from senior colleagues was rather low.

I did have an interesting chat with my colleague Roger Beecham (and homepage) who is speaking about data visualisation soon. He pointed me to a great paper by Matthew Kay and Jeffrey Heer. This allows me to bang my reproducible research drum. The Kay-Heer paper says:

This paper would not have been possible without the public release of data from Harrison et al. [1]. That release of data contributes to a broader conversation not only about the results of any particular study, but the analysis of data, and the accumulation of datasets and shared knowledge.

Here the authors perform a re-analysis of data which has been made freely and easily available; made available with the original R scripts which performed the original analysis. This can be contrasted with the data I used, which was available (yay!) but only in a PDF file, which required some work on my part to get into a PDF file, and with no trace of the software stack and precise procedures used to analyse the data. We should all publish our data and methods.


I’m finally doing some work which requires some genuine Bayesian analysis, and so have returned to playing with emcee. I’ve also been looking at PyMC3 which is an impressive piece of work, but also requires a bit of change of thinking from emcee.

Some notebooks can be found on GitHub.