My name is Oliver Parson, and I'm currently employed as a Senior Data Scientist at Bulb. I'm interested in investigating the ways in which machine learning can be used to break down household energy consumption data into individual appliances, also known as Non-intrusive Appliance Load Monitoring (NILM) or energy disaggregation.
Thursday, 19 December 2013
Thesis defence complete!
Earlier this week I had to defend my PhD thesis against the critique of both an internal and external examiner in what is referred to as a viva. Despite my best efforts to prepare, I was still pretty nervous going into the exam. However, it turned out to be a lot more enjoyable than I had expected. My examiners managed to tease out the finer details of my thesis without being at all aggressive, which I think had a really positive effect on the atmosphere of the viva. The result of the viva was a number of minor corrections which I'll need to make to my thesis over the next couple of months. This means I'll hopefully make the final version of my thesis available online by the end of February 2014, but potentially sooner depending on how busy I am at the start of next year.
Saturday, 14 December 2013
Energy disaggregation research at MLSUST 2013 workshop
The 2013 workshop on Machine Learning for Sustainability was recently held at at the NIPS conference in Lake Tahoe, NV, USA. The workshop was organised by Edwin Bonilla, NICTA and ANU, Tom Dietterich, Oregon State University, Theodoros Damoulas, NYU CUSP and NYU-Poly and Andreas Krause, ETH Zurich. The workshop invited papers which propose and apply machine learning algorithms to solve sustainability problems such as climate change, energy management and biodiversity monitoring. The workshop featured two poster sessions, in which the authors of the accepted papers were invited to present their work. Both poster sessions featured a paper on energy disaggregation, which I have briefly summarised below.
Interleaved Factorial Non-Homogeneous Hidden Markov Models for Energy Disaggregation. Mingjun Zhong, Nigel Goddard, Charles Sutton.
This paper proposes a method for disaggregating 2 minute energy consumption data into individual appliances. The approach is based upon an extension of the factorial hidden Markov model (FHMM), in which the appliance transition probabilities are dependent upon the time of day (non-homogeneous), and also the appliances are constrained such that only one appliance can change state per time slice (interleaved). The authors evaluate their approach on 100 homes from the Household Electricity Study, in which 20-30 days of sub-metered data from each household is used for training, while 5-10 days of data is held out for testing. The results show that both the interleaved and non-homogeneous extensions individually provide better performance than the basic FHMM, while a combination of the two provides the best performance. Finally, the authors identify a key finding in that the disaggregation accuracy varies greatly across different households, and raise this as an open problem for the NIALM community.
Using Step Variant Convolutional Neural Networks for Energy Disaggregation. Bingsheng Wang, Haili Dong, Chang-Tien Lu.
This paper proposes a method for disaggregating 15 minute interval aggregate energy data into individual appliances. The approach is based on Step Variant Convolutional Neural Networks (SVCNN), which use the aggregate energy consumption in the intervals t-2, t-1, t, t+1, t+2 to predict the energy consumption of each individual appliance in interval t. The authors evaluate their approach via cross validation using REDD, in which 3 houses are used to train the model while 2 other houses are used to test the performance. The results show that the SVCNN model achieves greater accuracy than both discriminative sparse coding models and factorial hidden Markov models. However, the results still show a relatively high whole home normalised disaggregation error of approximately 0.8, confirming the difficulty of the disaggregation of 15 minute energy data.
Further details on both the REDD and HES data sets are available in my post summarising the publicly available NIALM data sets.
Friday, 6 December 2013
Accuracy metrics for NIALM
Accuracy metrics are essential when evaluating the performance of an energy disaggregation algorithm in a given scenario. However, each paper seems to use a different metric when comparing their proposed approach to the state of the art. As a result, it is impossible to compare numerical results between papers. I don't believe this is the fault of the authors, since there is no single accuracy metric which is unquestionably better than all other accuracy metrics. Instead, the relevance of each metric depends largely on the intended use of the disaggregated data.
For example, if the use of disaggregated data is to be used to provide a breakdown of the energy consumption in a home, an accuracy metric which allows errors to cancel out over time would be suitable. However, if the use of the disaggregated data is to be used to suggest appliance loads to be deferred to a different time of day, a less forgiving accuracy metric would be required.
Therefore, inspired by discussions at the EPRI NILM 2013 workshop and in my recent involvement in the foundation of an open source disaggregation toolkit, I have decided to collect and categorise a list of commonly used accuracy metrics as shown below.
This list is mostly intended as a starting point for discussion regarding accuracy metrics, so please leave a comment if you notice any metrics I've left out!
For example, if the use of disaggregated data is to be used to provide a breakdown of the energy consumption in a home, an accuracy metric which allows errors to cancel out over time would be suitable. However, if the use of the disaggregated data is to be used to suggest appliance loads to be deferred to a different time of day, a less forgiving accuracy metric would be required.
Therefore, inspired by discussions at the EPRI NILM 2013 workshop and in my recent involvement in the foundation of an open source disaggregation toolkit, I have decided to collect and categorise a list of commonly used accuracy metrics as shown below.
Event based metrics
Event based metrics assess how well a disaggregation algorithm detects appliance change events (e.g. washing machine turns on). However, it is not trivial to determine appliance events from sub-metered power data to be used as ground truth, and as a result often involve some subjective judgement (e.g. should a washing machine changing state mid-cycle from spin to drain constitute an event?). Furthermore, deciding whether a detected event matches a ground truth event is also not trivial (e.g. should a detected event that is 1 second apart from a ground truth event be matched?).
- True positives, false positives, false negatives, true negatives
- Confusion matrices
- True positive rate, false, negative rate, precision, recall, F-score
Non-event based metrics
Non-event based metrics assess how well a disaggregation algorithm is able to infer the power demand of individual appliances over time. As such, it is highly dependent upon the sampling rate of the sub-metered appliance data used as the ground truth. Such metrics have the advantage that sub-metered appliance data is easily collected by hardware installations, and requires little subjective judgement. However, non-event based metrics suffer from the disadvantage that disaggregation algorithms can score very highly by predicting all appliances to always draw zero power. This occurs as a result of most appliances remaining off for the majority of each day, and therefore the disaggregation algorithm is able to correctly predict each appliances power for the majority of each day.
Overall metrics
Overall metrics assess how well a disaggregation algorithm is able to infer the total energy consumed by individual appliances over a period of time. Such metrics are often the most intuitive, since they directly correspond to the pie chart of household energy consumption (e.g. as provided by Neurio). However, overall metrics allow errors to cancel out over time (e.g. an appliance's power is overestimated on day 1, while it is underestimated on day 2, resulting in the algorithm being assigned 100% accuracy since these errors cancel each other out).
This list is mostly intended as a starting point for discussion regarding accuracy metrics, so please leave a comment if you notice any metrics I've left out!
Monday, 2 December 2013
EPRI NILM 2013 Workshop in Palo Alto
I recently attended the two day 2013 NILM workshop hosted by EPRI at their Palo Alto office. The workshop brought together mostly utilities and vendors from the USA, although there were also attendees from government departments, non-profits and also some academics. The agenda was centred around discussions of data collection, use cases and future collaboration. Unfortunately, this meant there wasn't any discussion of algorithmic detail, given that vendors generally prefer to keep such information private.
In terms of outcomes to the workshop, two working groups were formed. One to study the performance metrics required to assess the accuracy of NIALM approaches, and a second to define a set of data output standards to ensure interoperability between multiple NIALM systems. As yet, I don't have any further information regarding either working group, but if you leave a comment on this post I'd be happy to forward your information onto the group leaders.
From my perspective, the most interesting thing I learned from this workshop was about the smart meter deployments in the states of California and Texas. It turns out that both deployments are already complete, and the existing infrastructure is capable of reporting the household power demand over the home area demand at roughly 10 second intervals. However, in order to activate this functionality, each household must request their utility to remotely flick a software switch to start the smart meter communicating with any compatible devices. This situation is particularly interesting to myself since it shares the same data rates and availability with the smart meters due to by deployed in the UK by 2020.
In terms of outcomes to the workshop, two working groups were formed. One to study the performance metrics required to assess the accuracy of NIALM approaches, and a second to define a set of data output standards to ensure interoperability between multiple NIALM systems. As yet, I don't have any further information regarding either working group, but if you leave a comment on this post I'd be happy to forward your information onto the group leaders.
From my perspective, the most interesting thing I learned from this workshop was about the smart meter deployments in the states of California and Texas. It turns out that both deployments are already complete, and the existing infrastructure is capable of reporting the household power demand over the home area demand at roughly 10 second intervals. However, in order to activate this functionality, each household must request their utility to remotely flick a software switch to start the smart meter communicating with any compatible devices. This situation is particularly interesting to myself since it shares the same data rates and availability with the smart meters due to by deployed in the UK by 2020.