Wednesday, 25 June 2014

WikiEnergy Data Set Statistics

I recently wrote a post about the WikiEnergy data set released by Pecan Street Inc, and have since written a downloader and converter for the data set as part of NILMTK. In total, the data set contains 71 feed feeds monitored across 239 buildings over the period 1 Jan 2014 - 31 May 2014. However, only a subset of feeds were monitored for each building, and many buildings were not monitored for the full 5 months. This post provides a bit more insight into the content of the data set at the time of writing.

Feeds per building:


The histogram below shows the number of feeds monitored in each of the 239 buildings. It can be seen that the mode of the distribution is around 12 feeds per building, and therefore most of these buildings will be useful for evaluation energy disaggregation approaches.


Duration per building:


The histogram below shows the number of months for which each of the 239 buildings were monitored. It can be seen that vast majority of buildings were monitored for the full 5 months, while the remaining buildings were distributed between 1-4 months. However, this distribution will change dramatically once data for 2012-2013 is released.


Percentage energy sub-metered:


The histogram below shows the percentage of energy sub-metered in 235 of the 239 buildings. The remaining 4 buildings appeared to have energy sub-metered greater than 100%, and were therefore excluded from this plot. This distribution has two distinct peaks; one centred around 70% and another which peaks around 5%. The 63 buildings for which less than 40% of the energy was sub-metered are likely to be of limited use for evaluating energy disaggregation methods.



Buildings per feed:


The table below shows the number of buildings in which each of the 71 feeds were present. A description of each of the feeds is available from the Wiki-Energy Knowledge Base. It can be seen that the presence of feeds in buildings is quite sparse. However, the following feeds are present in the majority of buildings: the household aggregate power (use), air conditioning (air1), washing machine (clotheswasher1), dishwasher (dishwasher1), clothes dryer (drye1), electric heating (furnace1) and refrigerator (refrigerator1).

Feed Buildings
use 239
air1 224
air2 38
air3 5
airwindowunit1 3
aquarium1 1
bathroom1 57
bathroom2 7
bedroom1 65
bedroom2 30
bedroom3 4
bedroom4 0
bedroom5 0
car1 62
clotheswasher1 133
clotheswasher_dryg1 28
diningroom1 20
diningroom2 1
dishwasher1 150
disposal1 85
drye1 141
dryg1 29
freezer1 13
furnace1 184
furnace2 29
garage1 25
garage2 3
gen 116
grid 0
heater1 2
housefan1 2
icemaker1 1
jacuzzi1 13
kitchen1 46
kitchen2 17
kitchenapp1 103
kitchenapp2 73
lights_plugs1 79
lights_plugs2 40
lights_plugs3 16
lights_plugs4 4
lights_plugs5 2
lights_plugs6 0
livingroom1 64
livingroom2 10
microwave1 113
office1 31
outsidelights_plugs1 16
outsidelights_plugs2 3
oven1 89
oven2 3
pool1 4
pool2 0
poollight1 2
poolpump1 17
pump1 3
range1 61
refrigerator1 164
refrigerator2 14
security1 7
shed1 3
sprinkler1 9
unknown1 16
unknown2 6
unknown3 1
unknown4 1
utilityroom1 5
venthood1 19
waterheater1 21
waterheater2 2
winecooler1 4

Monday, 23 June 2014

Hart, G.W., Prototype Nonintrusive Appliance Load Monitor, 1985

Since Hart founded the field of energy disaggregation back in the '80s, most papers since have cited his 1992 summary article published in the Proceedings of the IEEE. However, I've seen many references to other papers but have rarely managed to get my hands on the full text. For this reason, I was particularly excited when the following technical report surfaced recently:

Hart, G.W., Prototype Nonintrusive Appliance Load Monitor, MIT Energy Laboratory Technical Report, and Electric Power Research Institute Technical Report, September 1985

Apparently, Hart had received a few requests for older papers over the years, which so far he'd been unable to locate. However, he recently came across a copy of the above paper in the online catalog of a library in Singapore. Mario Bergés then requested the paper via an inter-library loan, Suman Giri scanned the paper copy and Simon Leigh (Jack Kelly's MSc student) applied some post-processing to provide you with the beautiful copy you see today. Quite a good community effort in my opinion!

Thursday, 12 June 2014

New data sets released by WikiEnergy and University of California, Berkley

I've recently come across two new data sets which have been released in the past month:

WikiEnergy data


Pecan Street Inc have released a large amount of domestic electricity data via the WikiEnergy project. The data set currently contains data from 200 homes, in which both the household aggregate power demand and individual appliance power demands are monitored at 1 minute intervals. The data set currently contains 4 months of data from January-April 2014, although more data is likely to be released soon. The data is freely available to University members of the WikiEnergy community, and full details for database access can be found on the WikiEnergy Knowledge Base after registering.

BERDS - BERkeley EneRgy Disaggregation Data Set


The University of California, Berkley, have released electricity data collected from the Cory Hall on the UC Berkeley campus. The data set contains data collected from 4 categories of sub-metered loads: lighting, HVAC, receptacle (sockets) and other, for which many feeds are available for each load category. The data set contains measurements of active, reactive and apparent power which were collected at 20 second intervals. The data is available for free via Mehdi Maasoumy's website, and a paper briefly describing the data set appeared at the Big Learning workshop at NIPS 2013.

I've updated my blog post of publicly available data sets to include both of these releases.

Sunday, 1 June 2014

Attending NILM 2014

I'm really excited to be travelling to Austin tomorrow to attend NILM 2014. I'm particularly looking forward to meeting other people working in the field, so please come and introduce yourself if you're also attending! I've been involved in two papers which will be presented at the workshop:

  1. A Scalable Non-intrusive Load Monitoring System for Fridge-Freezer Energy Efficiency Estimation. Oliver Parson, Mark Weal, Alex Rogers. This paper summarises the final chapter of my thesis, which covers a large scale application of my work to 117 UK households. I'll be presenting this work in the 'new perspectives' afternoon session of the workshop, and also as a poster in the poster session at the end of the workshop.
  2. NILMTK: An Open Source Toolkit for Non-intrusive Load Monitoring. Nipun Batra, Jack Kelly, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, Amarjeet Singh, Mani Srivastava. This paper summarises the initial release of NILMTK; an open source toolkit for energy disaggregation research. This will be presented by Jack Kelly as a 30 minute demo in the final session of the workshop, and also as a poster in the following poster session.

I'll also be attending the Pecan Street WikiEnergy Data conference on the 4th of June, which also promises a very exciting list of speakers.

See you all in Texas!

Thursday, 22 May 2014

UK energy disaggregation meet-up

The field of energy disaggregation has expanded so much since I started my PhD in 2010, with new companies and research groups joining the field nearly every week. As a result, it's becoming increasingly difficult to keep track of who is working in this space, even when those people are working in your own country. The NILM workshops are starting to address this by bringing the international community together in Austin, Texas, although I'm sure the inherent cost of travel will prohibit some people attending the workshop.

For this reason, I've been talking with Peter Davies of Green Running, and we are keen to organise an event to provide the opportunity to meet other people working in the field of energy disaggregation in the UK. It will likely be 1 day event held this July in London, free for anyone to attend, and feature a range of presentations and demos, as well as many opportunities for networking. We have a location in central London already to host the event which should hopefully encourage attendees.

Please let us know via email (osp@ecs.soton.ac.uk or p.davies@greenrunning.com) if you would be interested in attending this event, and also feel free to pass on this information on to anyone else who might be interested.

Thursday, 15 May 2014

NILM 2014 workshop schedule released

The schedule for the NILM 2014 workshop in Austin has just been released. The workshop will include 3 sessions of paper presentations, a set of lightning talks and poster presentations, an invited keynote from Shankar Sasty, and a demo NILMTK. I'm really excited about the workshop, and look forward to seeing you there!

Friday, 9 May 2014

Training disaggregation algorithms without sub-metered data

I'm keen to include an unsupervised disaggregation algorithm (one that doesn't require appliance data for training) in NILMTK. At the moment, the toolkit only include two supervised benchmark disaggregation algorithms, which I think really limits its usefulness. This post is intended to be a first step towards a simple, intuitive and robust approach to learn models of household appliances using only household aggregate data. I would be really interested in any feedback from the community regarding any improvements or extensions.

Extracting step changes and clustering via a Gaussian mixture model

The approach can be summarised as follows:
  1. Extract a set of step changes by taking the differences between sequential aggregate power readings
  2. Take the absolute value of these differences such that both positive and negative step changes are identical
  3. Discard small step changes (e.g. < 200 W) since there is too much noise at this range to extract any meaningful structure
  4. Discard large step changes (e.g. > 3000 W) since these are most likely generated by multiple appliances changing state simultaneously.
  5. Cluster the remaining set of step changes using a Gaussian mixture model

Evaluation using data from real households

I applied this approach to data set of aggregate data collected from real households. Unfortunately, the data set does not contain any sub-metered data from such households, so no quantitative results can be provided regarding its accuracy. However, a visual inspection of the extracted step changes and identified clusters shows some encouraging results.

This approach worked very well on some houses, such as the one shown below. The plot shows a black and white histogram of the extracted step changes, in which peaks corresponding to appliances are clearly visible at roughly 1000 W, 1800 W, and 2300 W. The plot also shows coloured probability density functions (PDFs) corresponding to the clusters found. Interestingly, the clustering algorithm successfully finds the three appliances, as shown by the cyan and yellow curves. However, it's also worth noting that many other clusters were found which do not correspond to appliances.

Click to enlarge

There were also households in which no structure was present in the extracted step changes, and as a result none of the clusters correspond to individual appliances, such as the plot shown below. This is likely due to a large amount of measurement noise in the aggregate data, or a number of appliances with highly variable step changes.

Click to enlarge

Conclusions and future work

These experiments have shown that even a very simple model can successfully learn models for appliances using only aggregate data. However, it has also shown that the performance is likely to vary widely between different houses. An important challenge that has not been tackled here is that of labelling identified clusters, e.g. cyan cluster = lighting, red cluster = noise.