Thursday 21 June 2012

Public Data Sets for NIALM

It is essential to use real-world data when comparing the performance of NIALM techniques. However, such data sets are time consuming, costly, and often inconvenient to collect. To this end, researchers have begun to publicly release their data sets, therefore enabling other researchers to compare their approaches against common benchmarks. Here are some short descriptions of the data sets I'm aware of:

Almanac of Minutely Power Dataset (AMPds)

Stephen Makonin released the first version of the Almanac of Minutely Power Data set. The data set contains 1 minute aggregate meter readings as well as sub-metered readings from 19 individual circuits. Each reading includes measurements of voltage, current, frequency, power factor, real power, reactive power and apparent power. Furthermore, the aggregate gas and water consumption was also measured at 1 minute intervals, in addition to 1 individual usage for each utility. The data set spans an entire year from April 2012 to March 2013 from a single household in the greater Vancouver area, BC, Canada. The data set is available to anyone for free, although the authors require a username and password to be requested for the purposes of usage tracking.

Berkeley Energy Disaggregation Data Set (BERDS)

The University of California, Berkley, have released electricity data collected from the Cory Hall on the UC Berkeley campus. The data set contains data collected from 4 categories of sub-metered loads: lighting, HVAC, receptacle (sockets) and other, for which many feeds are available for each load category. The data set contains measurements of active, reactive and apparent power which were collected at 20 second intervals. The data is available for free via Mehdi Maasoumy's website, and a paper briefly describing the data set appeared at the Big Learning workshop at NIPS 2013.

A building-level office environment dataset of typical electrical appliances (BLOND)

Thomas Kriechbaumer & Hans-Arno Jacobsen of The Technical University of Munich (TUM) recently released the BLOND data set, which contains voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The study covers 53 appliances (16 classes) in a 3-phase power grid in Germany. The authors have released two versions of the data set: 1) BLOND-50 contains 213 days of measurements sampled at 50 kHz (aggregate) and 6.4 kHz (individual appliances), 2)BLOND-250 consists of the same setup: 50 days, 250 kHz (aggregate), 50 kHz (individual appliances). The data set is also described in more detail in the Scientific Data paper.

Building-Level fUlly labeled Electricity Disaggregation dataset (BLUED)

The BLUED data set contains high-frequency (12 kHz) household-level data from a single US household over a period of approximately 8 days. The data set also contains an event list of each time an appliance within the household changes state (e.g. microwave turns on). This data set was collected primarily for the evaluation of event based NIALM methods. The authors have also password protected access to the data set to keep track of its usage.

Controlled On/Off Loads Library dataset (COOLL)

The COOLL dataset was released by researchers at the PRISME laboratory at the University of OrlĂ©ans, which contains high-frequency from 12 different types of appliances. Similar to the tracebase and PLAID datasets, multiple instances of the each type were measured, and each instance was measured throughout 20 operations. During each controlled operation, current and voltage data was collected at a sample rate of 100 kHz. The dataset is summarised in an academic paper, and can be downloaded from github after filling in a registration form.

Dataport database (formerly WikiEnergy)

Pecan Street Inc have released a large amount of domestic electricity data via the Dataport initiative At the time of writing, the data contains data from 669 homes, in which both the household aggregate power demand and individual appliance power demands are monitored at 1 minute intervals. The installations began in January 2011, and data is still being collected for most buildings. The data is freely available to University members of the WikiEnergy community, and full details for database access can be found on the Dataport homepage.

Domestic electricity demand dataset of individual appliances in Germany (DEDDIAG)

Marc Wenninger, Andreas Maier & Jochen Schmidt have released DEDDIAG, a domestic electricity demand dataset of individual appliances in Germany. The data set contains recordings from 15 homes over a period of up to 3.5 years, in which 50 appliances have been recorded at a frequency of 1 Hz. The data set focuses on appliances of significance for load-shifting purposes, such as dishwashers, washing machines and refrigerators. One home also includes three-phase mains readings that can be used for disaggregation tasks. Additionally, DEDDIAG contains manual ground truth event annotations for 14 appliances, that provide precise start and stop timestamps. The authors have also released source code of the data collection system, as well as a python command line tool for loading the data.

Dutch Residential Energy Dataset (DRED)

Delft University of Technology (TUDelft) have released DRED dataset, which contains both house level and appliance energy consumption information. The live deployment consists of several sensors measuring electricity, occupants occupancy and ambient parameters in a household. The DRED dataset includes electricity data (aggregated energy consumption and appliance level energy consumption), ambient information (room-level indoor temperature, outdoor temperature, environmental parameters), occupancy information (room-level location information of occupants, WiFi and BT RSSI information for localization) and household information (house layout, number of appliance monitored, appliance-location mapping etc). The dataset is publicly available and can be obtained from the DRED website.

Electricity Consumption & Occupancy data set (ECO)

The ECO data set is a data set for non-intrusive load monitoring and occupancy detection research. It was collected in 6 Swiss households over a period of 8 months. For each of the households, the ECO data set provides 1 Hz aggregate consumption data (current, voltage, and phase shift for each of the three phases in the household) and also 1 Hz plug-level data measured from selected appliances. In addition, the data set also includes occupancy information measured through a tablet computer (manual labelling) and a passive infrared sensor (in some of the households). The data set is described in detail in a paper published at BuildSys 2014.

The GREEND data set was released by a collaboration between researchers at the Alpen-Adria-Universität Klagenfurt and WiTiKee s.r.l. The data set contains active power measurements taken at 1 second intervals of 9 individual appliances and the household aggregate power demand from 9 houses in Italy and Austria, over a period of up to one year. Further details can be found in the accompanying arXiv paper. In addition, a NILMTK converter is also available for the data set.

Household Electricity Use Study (HES)

In 2012, the UK Energy Savings TrustDepartment of Energy and Climate Change, and Department for Environment, Food and Rural Affairs published a 15 page report called Powering the Nation. This report summarises the full 600 page Household Electricity Use Study, which aimed to better understand how electricity is consumed in UK households. As part of this study, 251 owner-occupier households were monitored across England between April 2010 and April 2011. Of these households, 26 were monitored for 12 months, and 225 were monitored for 1 month. For each household, the energy consumption of 13-51 appliances was monitored at 2 minute intervals. A software portal is currently under development to provide access to the data set, although in the meantime the data can individually requested from ICF International by contacting and providing a postal mailing address and operating system details.

IDEAL Household Energy Dataset 

Released by researchers at the University of Edinburgh, the IDEAL Household Energy Dataset comprises data from 255 UK homes. Alongside electric and gas data from each home the corpus contains individual room temperature and humidity readings and temperature readings from the boiler. For 39 of the 255 homes more detailed data is available, including individual electrical appliance use data, and data on individual radiators. Sensor data is augmented by anonymised survey data and metadata including occupant demographics, self-reported energy awareness and attitudes, and building, room and appliance characteristics.

Indian Dataset for Ambient Water and Energy (iAWE)

The Indraprastha Institute of Information Technology recently released the iAWE data set, which contains aggregate and sub-metered electricity and gas data from 33 household sensors at 1 second resolution. The data set covers 73 days of a single house in Delhi, India. Each individual channel of the data can be downloaded separately in either SQL or CSV format from the download section at the bottom of the webpage.

Individual household electric power consumption Data Set 

EDF Energy released a data set in 2012 containing energy measurements made at a single household in France for a duration of 4 years. Average measurements are available at 1 minute resolution of the household aggregate active power, reactive power, voltage and current, as well as the active power of 3 sub-metered circuits. Although each circuit contains a few appliances, this is the largest data set in terms of duration of measurement. The complete data set is openly available from the UCI Machine Learning Repository.

Pecan Street Research Institute (no longer available)

Pecan Street Research Institute announced the release of a new data set designed specifically to enable the evaluation of electricity disaggregation technology. A free sample data set is available to members of its research consortium, which has now been opened up to university researchers. The sample data set contains 7 days of data from 10 houses in Austin, TX, USA, for which both aggregate and circuit data is also available containing power readings at 1 minute intervals. In addition to common household loads, 2 of the houses also have photovoltaic systems and 1 house also has an electric vehicle.

Reference Energy Disaggregation Dataset (REDD)

REDD contains both household-level and circuit-level data from 6 US households, over various durations (between a few weeks and a few months). Each house has two-phase mains input, and 10-25 individually monitored circuits. High-frequency (kHz) current and voltage data are available for both mains circuits, while low-frequency power measurements (3-4 second intervals) are available for the appliance circuits. This data set was collected primarily for the evaluation of non-event based NIALM methods. The authors have password protected access to the data set to keep track of its usage.

REFIT Electrical Load Measurements dataset

The REFIT data set was released as part of the Smart Home and Energy Demand Reduction project, by David Murray and Lina Stankovic at the University of Strathclyde. The data set contains active power measurements of the aggregate and 9 individual appliances from 20 homes in the Loughborough area of the UK, at a resolution of 1 sample every 8 seconds. This makes the REFIT the only UK data set which contains appliance level data at a sample rate great than once per minute. In addition, aggregate gas consumption data was also recorded at 30 minute intervals. However, it should be noted that the data was compressed by removing samples for which the power demand had not changed since the last reading. Further details can be found in a presentation from the EEDAL 2015 conferencea detailed technical report, and the dataset readme file. In addition, a NILMTK converter is also available for the data set.

Smart* Home Data Set (via the UMassTraceRepository)

Although not collected specifically for energy disaggregation, the Smart* (Smart Star) data set provides power data from 3 thoroughly sub-metered real households. The granularity of data collected for circuit level monitors (premises aggregate and individual circuits) is one reading per second, while individual plug loads are measured roughly every few seconds. Each house contains 21-26 circuit meters and almost all appliances are measured using plug meters. At the moment, aggregate, circuit and appliance data is available for house A, but only aggregate data is available for houses B and C.


The tracebase repository contains individual appliance data with the intention of creating a database for training NIALM algorithms. The repository contains a total of 1883 days of power readings, taken at 1 second intervals, for 158 appliance instances, of 43 different appliance types. Since the aim is to create an appliance database, no aggregate measurements are collected. The data is introduced in Reinhardt et al. 2012 and is available from the tracebase repository. The files are password protected, but a password can be requested via the download page.

UK Domestic Appliance-Level Electricity (UK-DALE) dataset

Jack Kelly released the first version of the UK-DALE in January 2015. The data set contains 16 kHz current and voltage aggregate meter readings and 6 second sub-metered power data from individual appliances across 3 UK homes, as well as 1 second aggregate and 6 second sub-metered power data for 2 additional homes. An update to the data set was released in August 2015 which has expanded the data available for house 1 to 2.5 years. Low frequency data is available to download in CSV or NILMTK HDF5 format, while high frequency data can be downloaded in FLAC file format.

As always, please leave a comment if you have released your own data set or know of someone who has. Also, if you notice any errors or updates please let me know. I'll do my best to keep this list up to date!