In unsupervised learning, the goal is often to determine the unknown structure of unlabelled data. However, in our case we don't simply want to construct a model which represents the aggregate power data. In fact, we want to build a model of the data in which appliances are explicitly represented. This way, once the learning process is complete, we can form the disaggregation task as an inference problem.
Previous unsupervised approaches to this problem have used clustering to identify unique behaviour of appliances. These approaches have been shown to work well when applied to multiple features extracted from high granularity data (sampled at kHz). However, in the case of low granularity data, there is no way to extract features such as reactive power, power factor, etc. and we are instead left with a single feature; (real) power.
To give a visual representation of how clustering might perform on real aggregate data sampled at 1 minute intervals, I ran some experiments on the REDD dataset. To do so, I did the following:
- Down sampled all data to 1 minute resolution
- Subtracted the power of each circuit from the household mains circuit to calculate the unallocated, or 'unknown', power
- Calculated the difference between consecutive power readings for each circuit
- Excluded any change in power less than 100 W
- Counted the power differences into bins for each circuit
- Plotted these bins as a stacked bar graph for each household
- There are two unique clusters at the higher end of the power axis (labelled washer dryer and oven I think). These clusters would be easily identified by a clustering algorithm due to their clear separation from the other appliances.
- There are two clusters around the 1500 W mark (corresponding to the microwave and kitchen outlets I think). One cluster completely subsumes the other, making it very difficult or even impossible for a clustering algorithm to separate the two.