Friday, 9 May 2014

Training disaggregation algorithms without sub-metered data

I'm keen to include an unsupervised disaggregation algorithm (one that doesn't require appliance data for training) in NILMTK. At the moment, the toolkit only include two supervised benchmark disaggregation algorithms, which I think really limits its usefulness. This post is intended to be a first step towards a simple, intuitive and robust approach to learn models of household appliances using only household aggregate data. I would be really interested in any feedback from the community regarding any improvements or extensions.

Extracting step changes and clustering via a Gaussian mixture model

The approach can be summarised as follows:
  1. Extract a set of step changes by taking the differences between sequential aggregate power readings
  2. Take the absolute value of these differences such that both positive and negative step changes are identical
  3. Discard small step changes (e.g. < 200 W) since there is too much noise at this range to extract any meaningful structure
  4. Discard large step changes (e.g. > 3000 W) since these are most likely generated by multiple appliances changing state simultaneously.
  5. Cluster the remaining set of step changes using a Gaussian mixture model

Evaluation using data from real households

I applied this approach to data set of aggregate data collected from real households. Unfortunately, the data set does not contain any sub-metered data from such households, so no quantitative results can be provided regarding its accuracy. However, a visual inspection of the extracted step changes and identified clusters shows some encouraging results.

This approach worked very well on some houses, such as the one shown below. The plot shows a black and white histogram of the extracted step changes, in which peaks corresponding to appliances are clearly visible at roughly 1000 W, 1800 W, and 2300 W. The plot also shows coloured probability density functions (PDFs) corresponding to the clusters found. Interestingly, the clustering algorithm successfully finds the three appliances, as shown by the cyan and yellow curves. However, it's also worth noting that many other clusters were found which do not correspond to appliances.

Click to enlarge

There were also households in which no structure was present in the extracted step changes, and as a result none of the clusters correspond to individual appliances, such as the plot shown below. This is likely due to a large amount of measurement noise in the aggregate data, or a number of appliances with highly variable step changes.

Click to enlarge

Conclusions and future work

These experiments have shown that even a very simple model can successfully learn models for appliances using only aggregate data. However, it has also shown that the performance is likely to vary widely between different houses. An important challenge that has not been tackled here is that of labelling identified clusters, e.g. cyan cluster = lighting, red cluster = noise.


  1. Interesting! Thanks Oliver for sharing.

    Is the algorithm part of NILMTK?

    1. Not exactly, though this approach is similar to that proposed by George Hart in 1985, which is implemented in NILMTK :)