Thursday 9 June 2016

Please help design a NILM competition!

Cross posted from Jack Kelly's blog:

Has disaggregation accuracy improved since the 1980s? Which algorithms are most accurate for a given use-case? Which (if any) use-cases are well served by NILM already?

It's pretty much impossible to answer any of these questions with confidence (unless you only consider the tiny number of algorithms for which you have access to executable code). We can't directly compare published results across papers because, when testing the disaggregation accuracy of NILM algorithms, each paper uses different datasets, different metrics, different pre-processing, etc.

This means that we can't measure progress over time. Nor can we decide which NILM algorithms are most promising and which might be dead-ends.

These are bad problems. Let's work towards fixing them.

Some other machine learning communities have had great success running yearly competitions. For example, the ImageNet "Large Scale Visual Recognition Challenge" has been running yearly since 2010. Some regard this competition as having played a crucial role in the recent dramatic increase in the accuracy of image classification algorithms.

The idea of running a NILM competition has been rumbling around for several years. But designing and implementing a NILM competition is hard. The community uses sample rates ranging from monthly to MHz. No single metric is informative for all use-cases. Collecting ground truth data (the power demand of individual appliances) is expensive and time-consuming.

Maybe we can pull this off. The first step is to decide on a design which will work for everyone.

To give us something concrete to debate, we'll outline one way this could work. This is not meant to be definitive! Think of this as the DNA for a clumsy, inefficient animal 500 million years ago. Together, we need to evolve this design into an elegant, efficient beast, well adapted to its environment.

Please shoot holes in this proposal! What won't work for you? What's impractical? What's unfair? What opens the competition up to cheating? How can we make the competition more attractive to researchers? How can we make the competition more informative for the community? How can we simplify the process?

The draft proposal is available on Google Docs. I've linked to a Google Doc rather than copying-and-pasting the proposal into this post so that we can update the proposal as the discussion develops. Please add your comments either to the mailing list discussion; or to the Google Doc (please sign your comment with your name; unless you deliberately want to be anonymous); or if you want to keep your comment private then email Jack directly.

Thanks, (in no particular order) Jack, Mario, Oli, Stephen, Grant, Marco, Peter

Thursday 2 June 2016

PNNL NILM vendor survey

Below is a message to NILM vendors I'm sharing on behalf of PNNL:

The Pacific Northwest National Laboratory (PNNL) continues its work to develop Non-Intrusive Load Monitoring (NILM) test protocols.  Activities to date have focused on the development of a technical working group (NILM vendors, users/potential users, and other stake holders), research and decisions on candidate performance metrics, and the development of performance protocols.  This last activity we are seeking input from the larger NILM vendor community.

Below is a short feedback form to assist in directing the protocol development.  We would appreciate your responses as soon as possible. Please send responses by this email to  We appreciate your time and participation.

NILM Performance Metrics Project:  Status and Feedback Request

Feedback Goal:  To better understand preferences and constraints of proposed metric implementation approaches.  Please consider the two approaches to evaluating NILM performance listed below, and then provide your feedback to the following questions.

Approach 1: Data Driven – NILM devices use a diverse set of previously collected interval data to test device performance.

Approach 2:  Laboratory Testing – NILM devices are connected to actual appliances and/or load simulation systems to test performance.

  1. Is your NILM platform/product capable of accepting 1-second to 1-minute interval data as inputs for disaggregation?
  2. At what sampling interval is your NILM platform/product designed to take measurements or data inputs, e.g., 1 minute, 5 minute, hourly, other?
  3. What specific inputs are necessary for your NILM platform or product, e.g., interval power data, energy data, voltage, current, reactive power, other?
  4. What appliances or end-uses does your NILM product target?
  5. What are the target use cases for your NILM product?
  6. Other comments or questions you’d like to share regarding the development of the Data Driven or Laboratory Testing protocols?