Friday 29 March 2013

Releasing code used in academic publications

In academia, it's good scientific practice to release code after producing a publication. This allows other researchers to replicate the publication's findings, benchmark against the published approach, and even extend the work in new directions. However, it's actually quite rare to find a given paper's code online, and even more so to find the documentation required to make use of the code. Although having said this, many academics are happy to provide their code when requested, especially if they're no longer pursuing that field of research.

Unfortunately, I seem to have fallen into the habit of providing my code upon request, instead of releasing it online. I guess the reason for this is that the task of releasing code always seems to be superseded by other upcoming deadlines, and consequently remains at the bottom of my to do list. As a result, I end up giving out an undocumented archive of my code, which I can't believe is particularly useful to many people.

Therefore, I've decided to release all my code at the end of my PhD. I'm hoping that after submitting my thesis, I'll have a window of time to tidy up these kind of loose ends. However, if I get to the end of the year (2013) without releasing anything, please remind me of this post!


  1. Totally agree with you Oliver! It's usually quite hard to see the source code of some publications widely published (in repositories, preferably). The main reason I see it's usually poor quality (code done to solve an specific purpose under strict time restrictions), no code documentation (code it's usually just a support tool for the publication) and lack of interest by most peers.

    By the way, I'm a great admirer of you blog and your initiative (I'm also a PhD candidate on NIALM, at UFRJ university in Rio, Brazil).

    Keep the good work! If possible we may collaborate on some way.


  2. I applaud your commitment to release your code; I think it shows a great deal of intellectual honesty and a desire to try to make the literature on disaggregation a little healthier.

    When I first started reading the literature on disaggregation, I was *amazed* that literally *no* disaggregation code was available. In other fields like computer vision and speech recognition, it appears to be the norm to release the code behind the paper. As far as I'm aware, the only open source code in the entire disaggregation literature is Kolter & Jaakkola 2012, but that code is for an implementation of their FHMM which doesn't do any disaggregation.

    I would argue that the lack of open-source code has a concrete, detrimental effect on the literature: as far as I can tell, it's currently IMPOSSIBLE to objectively compare the performance of any pair of disaggregation algorithms discussed in the literature. i.e. if you take paper A and paper B then you cannot robustly decide whether the disaggregation algorithm proposed in A is better than the algo in B. Different research groups use their own private data sets, their own performance metrics (or, worse, they don't even describe how they measured performance: they just give a "percentage accuracy", and leave it to the reader to guess what that means exactly). And, of course, no one releases their code so it's impossible to do a meta-analysis against a common dataset. This problem is so severe that I don't think we can even robustly claim that modern disaggregation approaches (FHMMs etc) out-perform the work Hart et al did in the 1980s and 90s.

    Of course, someone could re-implement a bunch of algorithms discussed in the literature and test them against a common dataset using the same performance metrics. But that doesn't count as "original research" and so no one's willing to do it. (although it'd make an awesome undergrad / MSc project).

    So yes, I thoroughly applaud your desire to release your code. If you can, please also consider releasing your raw data. I hope other people follow you. I certainly intend to ;)


Note: only a member of this blog may post a comment.