Thursday, 11 April 2013

Is REDD representative of actual electricity data?

Since its release in 2011, the REDD data set has revolutionised energy disaggregation research. To date, it's been cited 33 times in under two years, and has quickly become the standard data set upon which new approaches are benchmarked. As such, it's now hard to believe any results which are tested on simulated data or a proprietary data set instead.

However, I've often heard people mention the data is unrepresentative. Here are some of the reasons:

  • Location
    • Since the data was collected only from households in the Massachusetts area, there's very little variation due to location. As a result, it's a little hard to justify that results based on the REDD data set generalise to other countries or continents. In fact, I even wrote a blog post on how different the disaggregation problem is between the US and the UK.
  • Accuracy
    • The REDD data set contains a huge amount of data describing the household aggregate power demand. Even with the low resolution data, in which the power is down-sampled to one reading per second, the accuracy is still far greater than that of many off-the-shelf electricity monitors. Clearly high accuracy isn't a bad thing, but I doubt many researchers will voluntarily add noise to this data.
  • Reliability
    • A common problem with off-the-shelf electricity monitors is their unreliability. Building a disaggregation system that is robust to such random missing readings can be a real challenge. Although this is not a characteristic the REDD data set suffers from, it does contain some long gaps, as I reported in a post on REDD statistics.
  • Circuits
    • Despite the description in the SustKDD paper, the data set only contains household and circuit level recordings, and not plug level data. As a result, it's only possible to test the disaggregation of appliances which appear on their own on a circuit, and are labelled as such.
Despite these issues, REDD is by far the most comprehensive data set for evaluating non-intrusive load monitoring systems. However, there must be other data sets that can offer insight for energy disaggregation systems, even if they only contain aggregate data. If you know of any such data sets, I'd be very interested to hear from you!

1 comment:

  1. Great blog post (as always!)

    However, there must be other data sets that can offer insight for energy disaggregation systems, even if they only contain aggregate data.

    I expect you already know this but, if not: Cosm contains a fair amount of aggregate power data. A search for "power" on Cosm produces 2076 results. I have no idea what proportion of those feeds contain useful data! I believe it's fairly trivial to write a script to pull all those feeds onto your own machine, but I haven't done it myself. I imagine it could be quite time consuming to figure out which feeds are useful and which aren't (but if you do figure that out then please consider sharing your findings!)

    For a few months last year I monitored my home's aggregate power consumption, and some individual appliances, and a bunch of temperatures. Some of those data are on Cosm here and some are on github here. That's pretty crappy data though (not least because the CC IAMs I used often had long "blackouts" lasting an hour or two, which is what drove me to build my own power measurement kit).

    For the past few months I've been collecting power data from my own home. Last night I added the last 5 IAMs. I'm now logging 51 channels (!) including every appliance and the lighting circuit. I'm recording whole-home voltage and current waveforms at 16kHz (from which the system calculates active and apparent power once a second) as well as whole-home apparent power recorded using a standard home energy monitor CT clamp. I plan to release this dataset soon, along with smaller datasets recorded from a small number of MSc student homes. At release, this dataset certainly won't be as comprehensive as REDD but hopefully it'll add some variety, and I plan to continue to pester my MSc students to install more IAMs this year so we get more coverage of their home appliances ;)