Described in the paper:
Kolter JZ, Johnson MJ. REDD : A Public Data Set for Energy Disaggregation Research. In: Workshop on Data Mining Applications in Sustainability (SIGKDD). San Diego, CA; 2011
A few months ago I wrote a post giving some high level details about the data set, and since then, I've spent some time using it to benchmark various approaches. However, I've made many mistakes along the way, which were often due to lack of understanding of the data set. Due to the vastness of the data it's often hard to understand simple results, such as which appliances consume the most energy, or how long each house has been monitored. To tackle such confusion, I set about calculating a bunch of statistics and generating visualisations which I decided to share with the world through this post. For all the statistics below, I used the low frequency data (1 reading per second).
The data set contains 6 houses, for each of which I generated the following statistics:
|House||Up Time (days)||Reliability (%)||Average Energy Consumption (kWh/day)|
- Up time - duration for which mains power measurements are available at 1 second intervals
- Reliability - percentage of readings which are available at 1 second intervals
- Average energy consumption - the average energy consumed by each house's mains circuits
|Appliance||Number of Houses Present In||Average Energy Consumption (kWh/day)||Percentage of Household Energy|
A problem with the average energy consumption and percentage of household energy consumption columns is that although an appliance might be present in a house, it might not be used. This has a major affect on the average given that we only have 6 houses.
Houses and Appliances
A more reliable source of information is to consider each house individually, and examine on average how much energy each appliance consumes.
Average Energy Consumption (kWh/day)
I know this is mostly just raw statistics, so I'll look into some visualisations of this soon.