Thursday, 25 April 2013

DECC meeting on disaggregating UK smart meter data

Last week I attended an expert panel meeting for the Department of Energy and Climate Change to discuss how smart meter data could be used to better understand household energy use. The meeting was organised by Cambridge Architectural Research Ltd, and brought together a wide range of stakeholders from government, industry and academia. Among the many potential projects and barriers which were discussed, I've categorised what I believe to be the important facts:

Data availability

UK smart meters will only automatically upload 30 min data for billing purposes. However, 10 second data will also be available to Consumer Access Devices (CAD), via short range WiFi. This creates two possibilities for disaggregation from 10 second data:
  1. A disaggregation system could be installed in each household as a CAD
  2. A CAD could upload data to cloud storage via the home broadband connection
The second option seems the most realistic to me, given the intrinsic opt-in nature of disaggregation and the benefits of performing disaggregation in the cloud.

Data granularity

The most recent smart meter specification (SMETS v2 2013) states that only 10 second apparent power data will be available to CADs. However, it would theoretically be possible to increase the reporting rate up to 1 second data through a smart meter firmware upgrade. This rate of 1 report per second is the theoretical maximum rate of smart meters as a result of hardware limitations. Furthermore, current, voltage, harmonics, reactive power etc. will not be reported by smart meters at any sub-10 second rate.

Appliance database

Another topic discussed was the potential for a UK appliance database, similar to the Tracebase database, or a disaggregation test set, similar to the REDD data set. One potential source of data is the Powering the Nation database, which DECC/DEFRA plan to release in the near future. The study collected data from 250 homes which were monitored for either 1 month or 1 year to investigate domestic energy consumption habits.

Friday, 19 April 2013

New data set released by Pecan Street Research Institute

Pecan Street Research Institute recently announced the release of a new data set designed specifically to enable the evaluation of electricity disaggregation technology. A free sample data set is available to members of its research consortium, which has now been opened up to university researchers. The sample data set contains 7 days of data from 10 houses in Austin, TX, USA, for which both aggregate and circuit data is also available containing power readings at 1 minute intervals. In addition to common household loads, 2 of the houses also have photovoltaic systems and 1 house also has an electric vehicle.

Wednesday, 17 April 2013

NIALM helpful terminology

When discussing related research, most papers group existing disaggregation approaches into distinct categories. As a result, many taxonomies have emerged, and unfortunately they are not always well defined before they are used. I therefore decided to compile the following list of the terminology which I've seen in recent years:

Intrusive vs non-intrusive monitoring

  • Intrusive metering refers to the deployment of one monitor per appliance. This is clearly intrusive since it requires access to each appliance to install such equipment. This has the benefit in that the only uncertainty in such monitoring is due to inaccuracies in the metering hardware.
  • Non-intrusive metering refers to the deployment of one (or sometimes two) meters per household. This is clearly less intrusive, since it does not provide any inconvenience besides the installation of government mandated smart meters. However, this has the disadvantage that the disaggregation process is likely to introduce further inaccuracies.

Supervised vs unsupervised training

  • Supervised training (a.k.a manual setup) refers to performing disaggregation with the aid of labelled appliance data, generally from the same home in which disaggregation is performed. The training data normally consists of sub-metered appliance power data, or a phase in which appliances are turned on one by one, and labelled manually.
  • Unsupervised training (a.k.a automatic setup) refers to performing disaggregation without any training data from the household in which disaggregation is being performed. However, without any notion of what appliances exist or how they behave, at best a system can only identify distinct appliances (e.g. appliance 1, appliance 2), and cannot label them with an appliance name (e.g. refrigerator or washing machine).

Event-based vs non event-based disaggregation

  • Event-based disaggregation refers to methods which have distinct event detection (e.g. something switched on at 12pm) and event classification (e.g. it was the washing machine). These approaches are often identifiable by a sequential pipeline of algorithms (data collection -> smoothing -> edge detection -> classification). A core advantage of event-based approaches is that decisions are made sequentially, and therefore can easily be deployed as a real-time system.
  • Non event-based disaggregation refers to methods which combine event detection and classification into a single process, in which both are inferred simultaneously. These are often identifiable by their use of time series models, which are able to reason over a sequence of data. The advantage of non event-based approaches is that high confidence decisions can affect those that are likely to surround it (e.g. a refrigerator is likely to turn on 30 minutes after its last cycle ended).

High frequency vs low frequency sampling

  • High frequency sampling generally refers to meters which sample the current and voltage of a wire at a rate in the order of thousands of times per second (kHz). At this rate, information such as reactive power and current harmonics can be calculated, which are useful features for classification. However, few smart meters are likely to report data at this granularity.
  • Low frequency sampling generally refers to meters which sample at between once per second and once per hour. When reported at this rate, active power is indistinguishable from reactive power, and no harmonic content is available. This is the reporting rate of most smart meters.

Steady-state vs transient-state analysis

  • Steady-state analysis divides a power series into periods of constant power during which no appliances change state. The differences between these levels of constant power are then used to infer which state change(s) had taken place.
  • Transient-state analysis uses the patterns between steady states to classify appliance state changes. However, it is necessary to sample at a high frequency in order to extract transient features for most appliances.

As always, please leave a comment if you come across any terminology you think should be in this list.

Thursday, 11 April 2013

Is REDD representative of actual electricity data?

Since its release in 2011, the REDD data set has revolutionised energy disaggregation research. To date, it's been cited 33 times in under two years, and has quickly become the standard data set upon which new approaches are benchmarked. As such, it's now hard to believe any results which are tested on simulated data or a proprietary data set instead.

However, I've often heard people mention the data is unrepresentative. Here are some of the reasons:

  • Location
    • Since the data was collected only from households in the Massachusetts area, there's very little variation due to location. As a result, it's a little hard to justify that results based on the REDD data set generalise to other countries or continents. In fact, I even wrote a blog post on how different the disaggregation problem is between the US and the UK.
  • Accuracy
    • The REDD data set contains a huge amount of data describing the household aggregate power demand. Even with the low resolution data, in which the power is down-sampled to one reading per second, the accuracy is still far greater than that of many off-the-shelf electricity monitors. Clearly high accuracy isn't a bad thing, but I doubt many researchers will voluntarily add noise to this data.
  • Reliability
    • A common problem with off-the-shelf electricity monitors is their unreliability. Building a disaggregation system that is robust to such random missing readings can be a real challenge. Although this is not a characteristic the REDD data set suffers from, it does contain some long gaps, as I reported in a post on REDD statistics.
  • Circuits
    • Despite the description in the SustKDD paper, the data set only contains household and circuit level recordings, and not plug level data. As a result, it's only possible to test the disaggregation of appliances which appear on their own on a circuit, and are labelled as such.
Despite these issues, REDD is by far the most comprehensive data set for evaluating non-intrusive load monitoring systems. However, there must be other data sets that can offer insight for energy disaggregation systems, even if they only contain aggregate data. If you know of any such data sets, I'd be very interested to hear from you!

Friday, 5 April 2013

Why NIALM shouldn't be modelled as the knapsack or subset-sum problem

In his seminal work, Hart (1992) highlighted the similarities between the appliance disaggregation problem and some well-studied combinatorial optimisation problems. He stated that if the power demand of each appliance is known, the disaggregation problem can be modelled as an instance of the subset sum problem. This approach selects the set of appliances whose power demands sum to the household aggregate power, applied independently to each aggregate power reading. However, Hart identified a core problem of this approach:
  • Small fluctuations in the aggregate power lead to solutions with unrealistically high numbers of appliance switch events.
To confirm this in practice, I evaluated this formulation using simulated data. My simulations showed that this approach performs poorly for realistic numbers of appliances even with ideal conditions of no noise and only on-off appliances. Clearly, this is not a good model for appliance disaggregation.

More recently Egarter et al. (2013) extended this model, stating that if both the power demand and duration of usage of each appliance are known, the disaggregation problem can be modelled as a two-dimensional knapsack problem. This approach is similar to Hart's proposed method, with the exception that each aggregate measurement is not considered as an individual optimisation problem. Instead, appliance operations must persist for the exact given duration. However, I see two major problems with this approach:
  1. Optimal solutions might give unrealistic results. As the baseload power increases, I'd expect the number of solutions (those that produce a small difference between the sum of appliance power and the aggregate power) to increase rapidly. As a result, I seriously doubt that the actual optimal solution would give the best disaggregation accuracy.
  2. Appliance durations are often highly variable. Unlike power demand, their duration of usage is often determined by the user. However, extending this formalism such that appliances could have a variable duration would hugely increase the size of the solution space.
Although these approaches show promise in simple scenarios, I don't believe they capture the flexibility required by the energy disaggregation scenario. Instead, I believe probabilistic approaches are more appropriate, given their ability to reason over different solutions by estimating the likelihood of different event sequences.

Tuesday, 2 April 2013

Paper accepted at IJCAI on Biodiversity Monitoring

We recently had a paper accepted at the International Joint Conference on Artificial Intelligence 2013 based on the detection of the New Forest Cicada from audio recordings.


Automated acoustic recognition of species aims to provide a cost-effective method for biodiversity monitoring. This is particularly appealing for detecting endangered animals with a distinctive call, such as the New Forest cicada. To this end, we pursue a crowdsourcing approach, whereby the millions of visitors to the New Forest will help to monitor the presence of this cicada by means of a smartphone app that can detect its mating call. However, current systems for acoustic insect classification are aimed at batch processing and not suited to a real-time approach as required by this system, because they are too computationally expensive and not robust to environmental noise. To address this shortcoming we propose a novel insect detection algorithm based on a hidden Markov model to which we feed as a single feature vector the ratio of two key frequencies extracted through the Goertzel algorithm. Our results show that this novel approach, compared to the state of the art for batch insect classification, is much more robust to noise while also reducing the computational cost.


Davide Zilli, Oliver Parson, Geoff V Merrett, Alex Rogers. A Hidden Markov Model-Based Acoustic Cicada Detector for Crowdsourced Smartphone Biodiversity Monitoring. In: 23rd International Joint Conference on Artificial Intelligence. Beijing, China. 2013.