Thursday 21 June 2012

Public Data Sets for NIALM

It is essential to use real-world data when comparing the performance of NIALM techniques. However, such data sets are time consuming, costly, and often inconvenient to collect. To this end, researchers have begun to publicly release their data sets, therefore enabling other researchers to compare their approaches against common benchmarks. Here are some short descriptions of the data sets I'm aware of:

Almanac of Minutely Power Dataset (AMPds)


Stephen Makonin released the first version of the Almanac of Minutely Power Data set. The data set contains 1 minute aggregate meter readings as well as sub-metered readings from 19 individual circuits. Each reading includes measurements of voltage, current, frequency, power factor, real power, reactive power and apparent power. Furthermore, the aggregate gas and water consumption was also measured at 1 minute intervals, in addition to 1 individual usage for each utility. The data set spans an entire year from April 2012 to March 2013 from a single household in the greater Vancouver area, BC, Canada. The data set is available to anyone for free, although the authors require a username and password to be requested for the purposes of usage tracking.

Berkeley Energy Disaggregation Data Set (BERDS)


The University of California, Berkley, have released electricity data collected from the Cory Hall on the UC Berkeley campus. The data set contains data collected from 4 categories of sub-metered loads: lighting, HVAC, receptacle (sockets) and other, for which many feeds are available for each load category. The data set contains measurements of active, reactive and apparent power which were collected at 20 second intervals. The data is available for free via Mehdi Maasoumy's website, and a paper briefly describing the data set appeared at the Big Learning workshop at NIPS 2013.

A building-level office environment dataset of typical electrical appliances (BLOND)


Thomas Kriechbaumer & Hans-Arno Jacobsen of The Technical University of Munich (TUM) recently released the BLOND data set, which contains voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The study covers 53 appliances (16 classes) in a 3-phase power grid in Germany. The authors have released two versions of the data set: 1) BLOND-50 contains 213 days of measurements sampled at 50 kHz (aggregate) and 6.4 kHz (individual appliances), 2)BLOND-250 consists of the same setup: 50 days, 250 kHz (aggregate), 50 kHz (individual appliances). The data set is also described in more detail in the Scientific Data paper.

Building-Level fUlly labeled Electricity Disaggregation dataset (BLUED)


The BLUED data set contains high-frequency (12 kHz) household-level data from a single US household over a period of approximately 8 days. The data set also contains an event list of each time an appliance within the household changes state (e.g. microwave turns on). This data set was collected primarily for the evaluation of event based NIALM methods. The authors have also password protected access to the data set to keep track of its usage.


Controlled On/Off Loads Library dataset (COOLL)


The COOLL dataset was released by researchers at the PRISME laboratory at the University of OrlĂ©ans, which contains high-frequency from 12 different types of appliances. Similar to the tracebase and PLAID datasets, multiple instances of the each type were measured, and each instance was measured throughout 20 operations. During each controlled operation, current and voltage data was collected at a sample rate of 100 kHz. The dataset is summarised in an academic paper, and can be downloaded from github after filling in a registration form.

Dataport database (formerly WikiEnergy)


Pecan Street Inc have released a large amount of domestic electricity data via the Dataport initiative At the time of writing, the data contains data from 669 homes, in which both the household aggregate power demand and individual appliance power demands are monitored at 1 minute intervals. The installations began in January 2011, and data is still being collected for most buildings. The data is freely available to University members of the WikiEnergy community, and full details for database access can be found on the Dataport homepage.

Domestic electricity demand dataset of individual appliances in Germany (DEDDIAG)


Marc Wenninger, Andreas Maier & Jochen Schmidt have released DEDDIAG, a domestic electricity demand dataset of individual appliances in Germany. The data set contains recordings from 15 homes over a period of up to 3.5 years, in which 50 appliances have been recorded at a frequency of 1 Hz. The data set focuses on appliances of significance for load-shifting purposes, such as dishwashers, washing machines and refrigerators. One home also includes three-phase mains readings that can be used for disaggregation tasks. Additionally, DEDDIAG contains manual ground truth event annotations for 14 appliances, that provide precise start and stop timestamps. The authors have also released source code of the data collection system, as well as a python command line tool for loading the data.

Dutch Residential Energy Dataset (DRED)


Delft University of Technology (TUDelft) have released DRED dataset, which contains both house level and appliance energy consumption information. The live deployment consists of several sensors measuring electricity, occupants occupancy and ambient parameters in a household. The DRED dataset includes electricity data (aggregated energy consumption and appliance level energy consumption), ambient information (room-level indoor temperature, outdoor temperature, environmental parameters), occupancy information (room-level location information of occupants, WiFi and BT RSSI information for localization) and household information (house layout, number of appliance monitored, appliance-location mapping etc). The dataset is publicly available and can be obtained from the DRED website.

Electricity Consumption & Occupancy data set (ECO)


The ECO data set is a data set for non-intrusive load monitoring and occupancy detection research. It was collected in 6 Swiss households over a period of 8 months. For each of the households, the ECO data set provides 1 Hz aggregate consumption data (current, voltage, and phase shift for each of the three phases in the household) and also 1 Hz plug-level data measured from selected appliances. In addition, the data set also includes occupancy information measured through a tablet computer (manual labelling) and a passive infrared sensor (in some of the households). The data set is described in detail in a paper published at BuildSys 2014.


The GREEND data set was released by a collaboration between researchers at the Alpen-Adria-Universität Klagenfurt and WiTiKee s.r.l. The data set contains active power measurements taken at 1 second intervals of 9 individual appliances and the household aggregate power demand from 9 houses in Italy and Austria, over a period of up to one year. Further details can be found in the accompanying arXiv paper. In addition, a NILMTK converter is also available for the data set.

Household Electricity Use Study (HES)


In 2012, the UK Energy Savings TrustDepartment of Energy and Climate Change, and Department for Environment, Food and Rural Affairs published a 15 page report called Powering the Nation. This report summarises the full 600 page Household Electricity Use Study, which aimed to better understand how electricity is consumed in UK households. As part of this study, 251 owner-occupier households were monitored across England between April 2010 and April 2011. Of these households, 26 were monitored for 12 months, and 225 were monitored for 1 month. For each household, the energy consumption of 13-51 appliances was monitored at 2 minute intervals. A software portal is currently under development to provide access to the data set, although in the meantime the data can individually requested from ICF International by contacting efficient.products@icfi.com and providing a postal mailing address and operating system details.


IDEAL Household Energy Dataset 


Released by researchers at the University of Edinburgh, the IDEAL Household Energy Dataset comprises data from 255 UK homes. Alongside electric and gas data from each home the corpus contains individual room temperature and humidity readings and temperature readings from the boiler. For 39 of the 255 homes more detailed data is available, including individual electrical appliance use data, and data on individual radiators. Sensor data is augmented by anonymised survey data and metadata including occupant demographics, self-reported energy awareness and attitudes, and building, room and appliance characteristics.


Indian Dataset for Ambient Water and Energy (iAWE)


The Indraprastha Institute of Information Technology recently released the iAWE data set, which contains aggregate and sub-metered electricity and gas data from 33 household sensors at 1 second resolution. The data set covers 73 days of a single house in Delhi, India. Each individual channel of the data can be downloaded separately in either SQL or CSV format from the download section at the bottom of the webpage.

Individual household electric power consumption Data Set 


EDF Energy released a data set in 2012 containing energy measurements made at a single household in France for a duration of 4 years. Average measurements are available at 1 minute resolution of the household aggregate active power, reactive power, voltage and current, as well as the active power of 3 sub-metered circuits. Although each circuit contains a few appliances, this is the largest data set in terms of duration of measurement. The complete data set is openly available from the UCI Machine Learning Repository.


Pecan Street Research Institute (no longer available)


Pecan Street Research Institute announced the release of a new data set designed specifically to enable the evaluation of electricity disaggregation technology. A free sample data set is available to members of its research consortium, which has now been opened up to university researchers. The sample data set contains 7 days of data from 10 houses in Austin, TX, USA, for which both aggregate and circuit data is also available containing power readings at 1 minute intervals. In addition to common household loads, 2 of the houses also have photovoltaic systems and 1 house also has an electric vehicle.

Reference Energy Disaggregation Dataset (REDD)


REDD contains both household-level and circuit-level data from 6 US households, over various durations (between a few weeks and a few months). Each house has two-phase mains input, and 10-25 individually monitored circuits. High-frequency (kHz) current and voltage data are available for both mains circuits, while low-frequency power measurements (3-4 second intervals) are available for the appliance circuits. This data set was collected primarily for the evaluation of non-event based NIALM methods. The authors have password protected access to the data set to keep track of its usage.

REFIT Electrical Load Measurements dataset


The REFIT data set was released as part of the Smart Home and Energy Demand Reduction project, by David Murray and Lina Stankovic at the University of Strathclyde. The data set contains active power measurements of the aggregate and 9 individual appliances from 20 homes in the Loughborough area of the UK, at a resolution of 1 sample every 8 seconds. This makes the REFIT the only UK data set which contains appliance level data at a sample rate great than once per minute. In addition, aggregate gas consumption data was also recorded at 30 minute intervals. However, it should be noted that the data was compressed by removing samples for which the power demand had not changed since the last reading. Further details can be found in a presentation from the EEDAL 2015 conferencea detailed technical report, and the dataset readme file. In addition, a NILMTK converter is also available for the data set.

Smart* Home Data Set (via the UMassTraceRepository)


Although not collected specifically for energy disaggregation, the Smart* (Smart Star) data set provides power data from 3 thoroughly sub-metered real households. The granularity of data collected for circuit level monitors (premises aggregate and individual circuits) is one reading per second, while individual plug loads are measured roughly every few seconds. Each house contains 21-26 circuit meters and almost all appliances are measured using plug meters. At the moment, aggregate, circuit and appliance data is available for house A, but only aggregate data is available for houses B and C.

Tracebase


The tracebase repository contains individual appliance data with the intention of creating a database for training NIALM algorithms. The repository contains a total of 1883 days of power readings, taken at 1 second intervals, for 158 appliance instances, of 43 different appliance types. Since the aim is to create an appliance database, no aggregate measurements are collected. The data is introduced in Reinhardt et al. 2012 and is available from the tracebase repository. The files are password protected, but a password can be requested via the download page.

UK Domestic Appliance-Level Electricity (UK-DALE) dataset


Jack Kelly released the first version of the UK-DALE in January 2015. The data set contains 16 kHz current and voltage aggregate meter readings and 6 second sub-metered power data from individual appliances across 3 UK homes, as well as 1 second aggregate and 6 second sub-metered power data for 2 additional homes. An update to the data set was released in August 2015 which has expanded the data available for house 1 to 2.5 years. Low frequency data is available to download in CSV or NILMTK HDF5 format, while high frequency data can be downloaded in FLAC file format.

As always, please leave a comment if you have released your own data set or know of someone who has. Also, if you notice any errors or updates please let me know. I'll do my best to keep this list up to date!

31 comments:

  1. Hi Oli,

    Great post, as always.

    Just last night I ordered 24 CurrentCost Individual Appliance Monitors (to monitor the power consumption of the majority of appliances in my home. Each monitor samples once every six seconds).

    I've also ordered an OpenEnergyMonitor which should allow me to monitor voltage, real power and reactive power once a second for my entire house.

    I'm planning to put this data on github soon; I'll let you know when it's available.

    ReplyDelete
    Replies
    1. Hi again,

      I've just put all my existing data on github: https://github.com/JackKelly/domesticPowerData

      This is mostly just a re-packaging of the data.tar.gz file I already sent you.

      This dataset isn't especially useful for NILM work yet because I don't have a "ground truth" record of each appliance's state change. This will change when I install my 24 individual appliance monitors.

      Thanks,
      Jack

      Delete
    2. Hi Jack,

      I'm not sure if you've messed around with your EmonPi yet, but we used ours for some device classification using ML: https://github.com/buchananwp/EnergyMeter

      Delete
  2. Hi Oliver,

    Green Button has some datasets, as well. The URL is: http://www.greenbuttondata.org/greendevelop.aspx

    I also have some personal datasets but I have not had a chance to publish them (they also include natural gas and water consumption).

    Cheers,
    Stephen.

    ReplyDelete
  3. Hi Stephen,

    Thanks for the link! The the data sets that combine electricity, gas and water consumption are really interesting.

    I'm particularly interested in data sets that provide appliance-level monitoring. As far as I could see the Green Button data sets only contain household-level data. Is this the case or have I missed something?

    Thanks,
    Oli

    ReplyDelete
  4. Hi Oliver, I don't know if you're aware of this but http://www.tracebase.org/traces has a substantial collection of appliance level signatures (traces) as well.

    Suman

    ReplyDelete
    Replies
    1. Hi Suman, thanks for pointing me at this. Looking into tracebase has been on my to do list for a long time and I've only just got round to it!

      Delete
  5. Hi Oliver
    I'm working on NILM and I want to use REDD but I have some problem, I appreciate if you help me

    1- The appliances are labelled but they have not mentioned anywhere that which appliance is connected to each phases, It make problem for small appliances. How do you use them?

    2- To have reactive power, I've extract the high frequency current and voltage and calculate power and reactive power, but now when I compare the calculated power with provided one (in Mains), the waveforms are similar but the values are different, (it looks like that my data has negative DC offset!) Does anybody have the same problem?

    3- A lot of data has been missed for each house, I'm working with house 3 and 5 because high frequency data are only available for these houses (To calculate reactive power), for house-3 it is 22 days, but not complete hours of a day and for house-5 it has only 7 days incomplete data, how do you deal with this database?

    Thanks in advance
    Mostafa

    ReplyDelete
    Replies
    1. Hi Mostafa,

      Since these questions are purely about REDD, I suggest you ask Zico Kolter (kolter@csail.mit.edu). He will be able to give much better answers than I can.

      Oli

      Delete
    2. Thanks Oli
      Can you at least help me on the first one related to phase connection of each appliances. I think you had the same issue in your project and paper, probably

      Delete
    3. Hi Mostafa,

      I summed both phases into a single feed and disaggregated that, so I'm afraid I didn't encounter the appliance-phase problem. You will have to follow it up with Zico.

      Oli

      Delete
    4. Mostafa wrote: "To have reactive power, I've extract the high frequency current and voltage and calculate power and reactive power, but now when I compare the calculated power with provided one (in Mains), the waveforms are similar but the values are different, (it looks like that my data has negative DC offset!)"

      I don't have specific experience with the high-frequency REDD data but here are some notes about measuring reactive power using current readings taken from a CT clamp (you probably know this already...):

      * CT clamps will typically shift the phase of the current signal by about 1 or 2 degrees. I'm not sure if the authors have already compensated for this. They used 200A TED CTs; if you're lucky you might find a data sheet which tells you how much phase shift you should expect.

      * CT clamps will "drift" so you'll probably need to apply a digital high-pass filter to remove the DC offset on the current readings. Again, I don't know if the authors have already done this.

      * CT clamps typically only maintain their quoted accuracy when presented with currents above about 10% above their quoted capacity.

      There's lots of juicy information about this over on the Open Energy Monitor website: http://openenergymonitor.org/emon/

      Delete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Hi Oli,

    Do you aware of any data set that contains measurement data of power and harmonic for main circuit?

    Thanks.
    Bundit

    ReplyDelete
    Replies
    1. Hi Bundit,

      As far as I understand, you can extract harmonic information from high frequency measurements of current and voltage (i.e. you your data needs to represent the waveforms). However, I think the Belkin Kaggle competition data actually calculated such harmonics from high frequency data, and made them available at low frequency if that's what you're after: https://www.kaggle.com/c/belkin-energy-disaggregation-competition/data

      Hope this helps,
      Oli

      Delete
  8. If you get a moment, please could you add UK-DALE to the list? (we really should get round to putting this all onto a wiki some time so you don't have to spend time maintaining the list!)

    ReplyDelete
    Replies
    1. My goodness I'm not doing a very good job of promoting your data set am I?! I've added it to the list, though feel free to send me an updated description if you like!

      Delete
  9. Hi Oli,
    GreenD data set isn't in the list.
    Paper URL: http://arxiv.org/abs/1405.3100
    Data set page: http://sourceforge.net/projects/greend/
    It has a nilmtk converter also!

    ReplyDelete
    Replies
    1. I've just added GREEND to the post. Sorry for missing it in the first place!

      Delete
  10. Hi Oli,
    ECO data set isn't in the list.

    https://www.vs.inf.ethz.ch/res/show.html?what=eco-data

    ReplyDelete
    Replies
    1. Thanks! I've added it to the list and sorted it by alphabetical order to make data sets easier to find.

      Delete
  11. Hi Oli,

    Thanks very much for sharing the info on databases. I am interested in the data of "Household Electricity Use Study (HES)". So I just write to "efficient.products@icfi.com" then I could get all the data freely? I couldn't find any related info on the official website of "ICF International", so I am not sure if that is the way to obtain the data ..

    Best regards,

    Yue Zhou
    Cardiff University

    ReplyDelete
    Replies
    1. Hi Yue Zhou, glad to hear you found the information useful. Yes, contact the given email address and they will provide you with instructions.

      Delete
    2. I have sent an email to the above email address two weeks ago but have not received any replies. Did you get the data already? I would really appreciate it if you can share the data ...

      Best regards,

      Minh

      Delete
  12. http://www.ari.vt.edu/research-data/

    ReplyDelete
  13. Great job. Thank you so much. Does anyone has the communication data inside the smart grids? node to node packet info? I would appreciate if someone help me in this regards.

    ReplyDelete
  14. Thank you, I really need Dutch Residential Energy Dataset (DRED).

    ReplyDelete
  15. Hi,
    Does anyone have any recommendations on CTs/Smart Meters provide 1-second resolution for measuring appliance loads in the home? I'm wanting something that is ideally wifi/ZigBee/Modbus enabled.

    Thanks.

    ReplyDelete
  16. Hii all..Where can i find the one year data set for transmission or distribution power system...

    ReplyDelete

Note: only a member of this blog may post a comment.