Friday, 27 January 2012

Unsupervised learning for NIALM

I've recently been thinking about various training methods for NIALM systems, specifically those which can be applied to unlabelled aggregate power demand data sampled once per minute (or less frequently). Assuming no prior information of the appliances or their usage patterns, this clearly falls into the category of unsupervised learning.

In unsupervised learning, the goal is often to determine the unknown structure of unlabelled data. However, in our case we don't simply want to construct a model which represents the aggregate power data. In fact, we want to build a model of the data in which appliances are explicitly represented. This way, once the learning process is complete, we can form the disaggregation task as an inference problem.

Previous unsupervised approaches to this problem have used clustering to identify unique behaviour of appliances. These approaches have been shown to work well when applied to multiple features extracted from high granularity data (sampled at kHz). However, in the case of low granularity data, there is no way to extract features such as reactive power, power factor, etc. and we are instead left with a single feature; (real) power.

To give a visual representation of how clustering might perform on real aggregate data sampled at 1 minute intervals, I ran some experiments on the REDD dataset. To do so, I did the following:

  1. Down sampled all data to 1 minute resolution
  2. Subtracted the power of each circuit from the household mains circuit to calculate the unallocated, or 'unknown', power
  3. Calculated the difference between consecutive power readings for each circuit
  4. Excluded any change in power less than 100 W
  5. Counted the power differences into bins for each circuit
  6. Plotted these bins as a stacked bar graph for each household
As an example, here's the chart for house 1:
You might want to click on the image to enlarge it since the inline resolution isn't great.

There are two key points to take from this plot:
  1. There are two unique clusters at the higher end of the power axis (labelled washer dryer and oven I think). These clusters would be easily identified by a clustering algorithm due to their clear separation from the other appliances.
  2. There are two clusters around the 1500 W mark (corresponding to the microwave and kitchen outlets I think). One cluster completely subsumes the other, making it very difficult or even impossible for a clustering algorithm to separate the two.
This is just one example, and although the appliances and their usage will be different across houses, I believe this trend will continue. There's always likely to be appliances with high power demands that are easily clustered, however, for appliances with lower power demands the corresponding clusters are increasingly more likely to overlap.

Although at first glance this might seem okay, because we're more interested in the appliances that consume the most energy. However, power demand and energy consumption are not always correlated. This is because power demand represents the rate of energy consumption, and therefore energy consumption depends of both the appliance's power demand and its duration of use. Two examples of appliance types with low power demands but high energy consumptions are the refrigerator and lighting. Because these appliances are on for such a long time, their energy consumption might turn out to be similar or even greater than kitchen white goods with the highest power demands.

I also generated the graphs for the other 5 houses in the data set, which I've included below (click to enlarge):