However, I've often heard people mention the data is unrepresentative. Here are some of the reasons:
- Since the data was collected only from households in the Massachusetts area, there's very little variation due to location. As a result, it's a little hard to justify that results based on the REDD data set generalise to other countries or continents. In fact, I even wrote a blog post on how different the disaggregation problem is between the US and the UK.
- The REDD data set contains a huge amount of data describing the household aggregate power demand. Even with the low resolution data, in which the power is down-sampled to one reading per second, the accuracy is still far greater than that of many off-the-shelf electricity monitors. Clearly high accuracy isn't a bad thing, but I doubt many researchers will voluntarily add noise to this data.
- A common problem with off-the-shelf electricity monitors is their unreliability. Building a disaggregation system that is robust to such random missing readings can be a real challenge. Although this is not a characteristic the REDD data set suffers from, it does contain some long gaps, as I reported in a post on REDD statistics.
- Despite the description in the SustKDD paper, the data set only contains household and circuit level recordings, and not plug level data. As a result, it's only possible to test the disaggregation of appliances which appear on their own on a circuit, and are labelled as such.