My name is Oliver Parson, and I'm currently employed as a Senior Data Scientist at Bulb. I'm interested in investigating the ways in which machine learning can be used to break down household energy consumption data into individual appliances, also known as Non-intrusive Appliance Load Monitoring (NILM) or energy disaggregation.
Monday, 17 December 2012
Improving the efficiency of home heating
My research group recently launched myJoulo, a project aimed at increasing awareness of how energy is being consumed in the home. The project gives away free temperature sensors, which collect data about how the temperature of your home varies relative to the temperature outside. It then asks you to upload the data, in order to provide personalised feedback detailing how you can reduce the price of your heating bill. The whole process is available for free to anyone living in the UK. For more details see the press release.
Friday, 14 December 2012
Video on unsupervised training methods for NIALM
I recently created a video describing my recent work on unsupervised training methods for non-intrusive appliance load monitoring systems. A high quality version of the video is available from the ORCHID project website, although I thought I'd also include it here for convenience:
Wednesday, 12 December 2012
Popular Mechanics predicts NIALM to become reality in next 110 years
A Popular Mechanics article recently compiled a list of technology predictions for the next 110 years. One of the forecasts sounded oddly familiar:
"Smart homes will itemize electric, water, and gas bills by fixture and appliance. Shwetak Patel, a 30-year-old MacArthur Fellow, is working on low-cost sensors that monitor electrical variations in power lines to detect each appliance's signature."
Good to hear people are getting excited about energy disaggregation, although personally I hope it doesn't take the full 110 years to become widely available.
"Smart homes will itemize electric, water, and gas bills by fixture and appliance. Shwetak Patel, a 30-year-old MacArthur Fellow, is working on low-cost sensors that monitor electrical variations in power lines to detect each appliance's signature."
Good to hear people are getting excited about energy disaggregation, although personally I hope it doesn't take the full 110 years to become widely available.
Thursday, 22 November 2012
New name, new domain
I recently decided my blog could do with a little refreshing to more closely reflect its content. I've therefore changed its name to Disaggregated Homes, which can now be found at blog.oliverparson.co.uk. Please update your links, although any visits to the old URL should be automatically redirected.
Wednesday, 14 November 2012
Opening up the “black box” of the Home
Yesterday I attended a Smart Demand seminar titled 'Opening up the “black box” of the Home', organised by Pilgrim Beart, founder of AlertMe. The purpose of the seminar was to present a range of ideas related to smart metering in the UK, and to discuss what would need to be done in order to collect a data set that accurately describes domestic electricity consumption. The seminar was well attended by UK energy monitoring companies, such as Moixa and Onzo, but also represented views from academia. I particularly enjoyed Miroslav Hamouz's talk on the abilities and limitations of disaggregation, and completely agree that it is essential to understand what disaggregation can realistically achieve using actual smart meter data.
Saturday, 27 October 2012
My home energy disaggregation system
This post describes my home energy disaggregation system, using tools from AlertMe and PlotWatt. I've cross-posted it from its own page on my website, and although this post probably won't stay up to date, the page on my website should. If you're interested in setting up a similar system, feel free to leave a comment!
In order to monitor the electricity consumption of my home, I use an AlertMe SmartEnergy kit. This consists of a battery powered SmartMeter reader and a mains powered SmartHub. The SmartMeter reader is a current clamp, which clips onto the electricity input to my home within my circuit breaker box. This clamp calculates the flow of electricity through the wire by measuring the magnetic field surrounding it, and therefore doesn't need to physically break the circuit. The current clamp sends second-by-second readings of my home's power demand to the SmartHub via a ZigBee wireless network. The hub is attached to my router via an Ethernet cable, which allows it to upload my electricity data to AlertMe's cloud storage.
In order to monitor the electricity consumption of my home, I use an AlertMe SmartEnergy kit. This consists of a battery powered SmartMeter reader and a mains powered SmartHub. The SmartMeter reader is a current clamp, which clips onto the electricity input to my home within my circuit breaker box. This clamp calculates the flow of electricity through the wire by measuring the magnetic field surrounding it, and therefore doesn't need to physically break the circuit. The current clamp sends second-by-second readings of my home's power demand to the SmartHub via a ZigBee wireless network. The hub is attached to my router via an Ethernet cable, which allows it to upload my electricity data to AlertMe's cloud storage.
My AlertMe SmartMeter reader.
My PhD has built up my interest in non-intrusive appliance monitoring; software which calculates appliance-level energy feedback using only home-level energy data. Although AlertMe don't offer such a service, this is where PlotWatt, a cloud-based software company, comes in. To make use of PlotWatt's appliance-level analysis, I needed to transfer my data from AlertMe's data cloud to PlotWatt's data cloud. To do so, I set up my Raspberry Pi to periodically download my data from AlertMe and upload it to PlotWatt. I've since open-sourced the project to allow anyone to use or modify my code. This software and PlotWatt's algorithms have allowed me to find out how much money I spend keeping each appliance running using PlotWatt's online dashboard.
My Raspberry Pi.
A breakdown of my household's monthly energy costs is available at plotwatt.com. However, even enthusiasts like myself don't check this daily. Therefore, I wanted an energy display from which I could pick up this information as I walked past. To create such a display, I set up an old monitor attached to my Raspberry Pi, which displays the dashboard from plotwatt.com.
My home energy monitoring system.
Wednesday, 24 October 2012
Public NIALM Reference Library
Today I decided to make my NIALM reference library public. Over the past few years I've collected about 150 references which I'd like to share with the wider NIALM community. Since I use Mendeley to manage my references, the easiest way to share it was through a public group:
Oliver Parson's NIALM library
This group will be updated as new papers are published, so feel free to join it to receive updates.
Happy reading!
UPDATE: Please feel free to follow the group, which should give you updates when I add references to the library. However, I'll just ignore any requests to join the group, which would give you write access!
Oliver Parson's NIALM library
This group will be updated as new papers are published, so feel free to join it to receive updates.
Happy reading!
UPDATE: Please feel free to follow the group, which should give you updates when I add references to the library. However, I'll just ignore any requests to join the group, which would give you write access!
Tuesday, 16 October 2012
Tracebase - an appliance training data repository
The tracebase repository contains individual appliance data with the intention of creating a database for training appliance recognition algorithms. The repository contains a total of 1883 days of power readings, taken at 1 second intervals, for 158 appliance instances, of 43 different appliance types. Since the aim is to create an appliance database, no aggregate measurements are collected.
The data is introduced in Reinhardt et al. 2012 and is available from the tracebase repository. The files are password protected, but a password can be requested via the download page.
I've also updated my post of Public Data Sets for NIALM.
The data is introduced in Reinhardt et al. 2012 and is available from the tracebase repository. The files are password protected, but a password can be requested via the download page.
I've also updated my post of Public Data Sets for NIALM.
Sunday, 14 October 2012
alertme2plotwatt - Using PlotWatt to disaggregate AlertMe data
Today I want to opensource a project I've been working called alertme2plotwatt, a python library for uploading AlertMe data to PlotWatt. I use an AlertMe system to collect second-by-second electricity data and upload it to the cloud. However, as yet AlertMe doesn't offer any disaggregation capability. Conversely, PlotWatt offers a hardware-agnostic cloud-based data analysis toolkit to disaggregate your energy data. Unfortunately, PlotWatt doesn't yet support AlertMe data out-of-the-box. Luckily, both AlertMe and PlotWatt offer their own APIs to provide data access. This has allowed me to write a script to download second-by-second household aggregate data collected by my AlertMe system and upload it to PlotWatt to be disaggregated into individual appliances.
To use this, you will need:
To use this, you will need:
- An AlertMe account (and subscription)
- An AlertMe MeterReader attached to your household electricity input
- A PlotWatt account (free)
Full details for using the library can be found on the github project page.
So far, I've used the library to copy about a year of second-by-second data from my AlertMe account to my PlotWatt account. However, the project is far from perfect, so please feel free to contribute code to increase the reliability, flexibility or clarity of documentation.
Friday, 28 September 2012
NIALM in academia v NIALM in industry
Now I've had the chance to experience the field of NIALM from both an academic and industry perspective, I felt compelled to share my thoughts on both. Here they are:
A luxury of academia is the ability to imagine possible future scenarios, and design approaches which provide value in such a scenario. Although this allows academia to think beyond the current state of the world, it often results in the opening of a reality gap. As such, scenarios are often proposed to fit a model, rather than a model which fits the real-world in either its current state or near future. Conversely, industry needs to provide value today, or within the near future, or it cannot become a profitable business.
In industry, mathematical elegance and theoretical proof matter little. Conversely, this is generally considered to be the core of an academic paper. Instead, performance is again considered to be the primary measure worth worrying about by industry folk. As a result, if the an unprincipled heuristic approach is shown to outperform its principalled rivals, the unprincipled approach is always the winner.
In academia, each publication must have a clear and measurable contribution. This generally takes the form of a description and evaluation of a model being applied within a domain for the first time. By this definition, there is no value in reimplementing an existing approach in the same domain. As such, unless authors have packaged and released their code, there is little chance of a side by side comparison. However, in industry performance is far more important than novelty. Consequently, there is the most value in implementing the state of the art, and any extensions only provide value if justified by the increase in performance.
Computer science academics talk a lot about computational complexity. Algorithms are generally discussed in terms of their linear, quadratic, exponential etc. complexity, and practical applicability is often lost in the search for optimal solutions. If an application is the focus of a contribution, I'd prefer to see the complexity grounded in terms of cost when scaling from neighborhood to national scales, or minute to year scales.
Academics often dismiss implementation problems as not worth their attention. Instead, it's generally sufficient for a publication to demonstrate a proof-of-concept, in which the proposed approach is shown to work on a small scale example. However, such implementation issues are key for a product to be viable in the real world. As such, industry focuses heavily upon such issues, since it's essential that their products execute reliably and in an unsupervised manner at real world scales.
Reality Gap
A luxury of academia is the ability to imagine possible future scenarios, and design approaches which provide value in such a scenario. Although this allows academia to think beyond the current state of the world, it often results in the opening of a reality gap. As such, scenarios are often proposed to fit a model, rather than a model which fits the real-world in either its current state or near future. Conversely, industry needs to provide value today, or within the near future, or it cannot become a profitable business.
Mathematics
In industry, mathematical elegance and theoretical proof matter little. Conversely, this is generally considered to be the core of an academic paper. Instead, performance is again considered to be the primary measure worth worrying about by industry folk. As a result, if the an unprincipled heuristic approach is shown to outperform its principalled rivals, the unprincipled approach is always the winner.
Novelty
In academia, each publication must have a clear and measurable contribution. This generally takes the form of a description and evaluation of a model being applied within a domain for the first time. By this definition, there is no value in reimplementing an existing approach in the same domain. As such, unless authors have packaged and released their code, there is little chance of a side by side comparison. However, in industry performance is far more important than novelty. Consequently, there is the most value in implementing the state of the art, and any extensions only provide value if justified by the increase in performance.
Scalability
Computer science academics talk a lot about computational complexity. Algorithms are generally discussed in terms of their linear, quadratic, exponential etc. complexity, and practical applicability is often lost in the search for optimal solutions. If an application is the focus of a contribution, I'd prefer to see the complexity grounded in terms of cost when scaling from neighborhood to national scales, or minute to year scales.
Implementation
Academics often dismiss implementation problems as not worth their attention. Instead, it's generally sufficient for a publication to demonstrate a proof-of-concept, in which the proposed approach is shown to work on a small scale example. However, such implementation issues are key for a product to be viable in the real world. As such, industry focuses heavily upon such issues, since it's essential that their products execute reliably and in an unsupervised manner at real world scales.
Sunday, 5 August 2012
UMASS Smart* Home Data Set
Although not collected specifically for energy disaggregation, the Smart* (Smart Star) data set provides power data from 3 thoroughly sub-metered real households. The granularity of data collected for circuit level monitors (premises aggregate and individual circuits) is one reading per second, while individual plug loads are measured roughly every few seconds. Each house contains 21-26 circuit meters and almost all appliances are measured using plug meters.
The data is available for download from the UMassTraceRepository. At the moment, aggregate, circuit and appliance data is available for house A, but only aggregate data is available for houses B and C. I've also updated my post on Public Data Sets for NIALM.
The data is available for download from the UMassTraceRepository. At the moment, aggregate, circuit and appliance data is available for house A, but only aggregate data is available for houses B and C. I've also updated my post on Public Data Sets for NIALM.
Thursday, 21 June 2012
Public Data Sets for NIALM
It is essential to use real-world data when comparing the performance of NIALM techniques. However, such data sets are time consuming, costly, and often inconvenient to collect. To this end, researchers have begun to publicly release their data sets, therefore enabling other researchers to compare their approaches against common benchmarks. Here are some short descriptions of the data sets I'm aware of:
Stephen Makonin released the first version of the Almanac of Minutely Power Data set. The data set contains 1 minute aggregate meter readings as well as sub-metered readings from 19 individual circuits. Each reading includes measurements of voltage, current, frequency, power factor, real power, reactive power and apparent power. Furthermore, the aggregate gas and water consumption was also measured at 1 minute intervals, in addition to 1 individual usage for each utility. The data set spans an entire year from April 2012 to March 2013 from a single household in the greater Vancouver area, BC, Canada. The data set is available to anyone for free, although the authors require a username and password to be requested for the purposes of usage tracking.
Delft University of Technology (TUDelft) have released DRED dataset, which contains both house level and appliance energy consumption information. The live deployment consists of several sensors measuring electricity, occupants occupancy and ambient parameters in a household. The DRED dataset includes electricity data (aggregated energy consumption and appliance level energy consumption), ambient information (room-level indoor temperature, outdoor temperature, environmental parameters), occupancy information (room-level location information of occupants, WiFi and BT RSSI information for localization) and household information (house layout, number of appliance monitored, appliance-location mapping etc). The dataset is publicly available and can be obtained from the DRED website.
In 2012, the UK Energy Savings Trust, Department of Energy and Climate Change, and Department for Environment, Food and Rural Affairs published a 15 page report called Powering the Nation. This report summarises the full 600 page Household Electricity Use Study, which aimed to better understand how electricity is consumed in UK households. As part of this study, 251 owner-occupier households were monitored across England between April 2010 and April 2011. Of these households, 26 were monitored for 12 months, and 225 were monitored for 1 month. For each household, the energy consumption of 13-51 appliances was monitored at 2 minute intervals. A software portal is currently under development to provide access to the data set, although in the meantime the data can individually requested from ICF International by contacting efficient.products@icfi.com and providing a postal mailing address and operating system details.
Released by researchers at the University of Edinburgh, the IDEAL Household Energy Dataset comprises data from 255 UK homes. Alongside electric and gas data from each home the corpus contains individual room temperature and humidity readings and temperature readings from the boiler. For 39 of the 255 homes more detailed data is available, including individual electrical appliance use data, and data on individual radiators. Sensor data is augmented by anonymised survey data and metadata including occupant demographics, self-reported energy awareness and attitudes, and building, room and appliance characteristics.
The Indraprastha Institute of Information Technology recently released the iAWE data set, which contains aggregate and sub-metered electricity and gas data from 33 household sensors at 1 second resolution. The data set covers 73 days of a single house in Delhi, India. Each individual channel of the data can be downloaded separately in either SQL or CSV format from the download section at the bottom of the webpage.
EDF Energy released a data set in 2012 containing energy measurements made at a single household in France for a duration of 4 years. Average measurements are available at 1 minute resolution of the household aggregate active power, reactive power, voltage and current, as well as the active power of 3 sub-metered circuits. Although each circuit contains a few appliances, this is the largest data set in terms of duration of measurement. The complete data set is openly available from the UCI Machine Learning Repository.
Pecan Street Research Institute announced the release of a new data set designed specifically to enable the evaluation of electricity disaggregation technology. A free sample data set is available to members of its research consortium, which has now been opened up to university researchers. The sample data set contains 7 days of data from 10 houses in Austin, TX, USA, for which both aggregate and circuit data is also available containing power readings at 1 minute intervals. In addition to common household loads, 2 of the houses also have photovoltaic systems and 1 house also has an electric vehicle.
REDD contains both household-level and circuit-level data from 6 US households, over various durations (between a few weeks and a few months). Each house has two-phase mains input, and 10-25 individually monitored circuits. High-frequency (kHz) current and voltage data are available for both mains circuits, while low-frequency power measurements (3-4 second intervals) are available for the appliance circuits. This data set was collected primarily for the evaluation of non-event based NIALM methods. The authors have password protected access to the data set to keep track of its usage.
The REFIT data set was released as part of the Smart Home and Energy Demand Reduction project, by David Murray and Lina Stankovic at the University of Strathclyde. The data set contains active power measurements of the aggregate and 9 individual appliances from 20 homes in the Loughborough area of the UK, at a resolution of 1 sample every 8 seconds. This makes the REFIT the only UK data set which contains appliance level data at a sample rate great than once per minute. In addition, aggregate gas consumption data was also recorded at 30 minute intervals. However, it should be noted that the data was compressed by removing samples for which the power demand had not changed since the last reading. Further details can be found in a presentation from the EEDAL 2015 conference, a detailed technical report, and the dataset readme file. In addition, a NILMTK converter is also available for the data set.
Although not collected specifically for energy disaggregation, the Smart* (Smart Star) data set provides power data from 3 thoroughly sub-metered real households. The granularity of data collected for circuit level monitors (premises aggregate and individual circuits) is one reading per second, while individual plug loads are measured roughly every few seconds. Each house contains 21-26 circuit meters and almost all appliances are measured using plug meters. At the moment, aggregate, circuit and appliance data is available for house A, but only aggregate data is available for houses B and C.
The tracebase repository contains individual appliance data with the intention of creating a database for training NIALM algorithms. The repository contains a total of 1883 days of power readings, taken at 1 second intervals, for 158 appliance instances, of 43 different appliance types. Since the aim is to create an appliance database, no aggregate measurements are collected. The data is introduced in Reinhardt et al. 2012 and is available from the tracebase repository. The files are password protected, but a password can be requested via the download page.
As always, please leave a comment if you have released your own data set or know of someone who has. Also, if you notice any errors or updates please let me know. I'll do my best to keep this list up to date!
Almanac of Minutely Power Dataset (AMPds)
Berkeley Energy Disaggregation Data Set (BERDS)
The University of California, Berkley, have released electricity data collected from the Cory Hall on the UC Berkeley campus. The data set contains data collected from 4 categories of sub-metered loads: lighting, HVAC, receptacle (sockets) and other, for which many feeds are available for each load category. The data set contains measurements of active, reactive and apparent power which were collected at 20 second intervals. The data is available for free via Mehdi Maasoumy's website, and a paper briefly describing the data set appeared at the Big Learning workshop at NIPS 2013.
Thomas Kriechbaumer & Hans-Arno Jacobsen of The Technical University of Munich (TUM) recently released the BLOND data set, which contains voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The study covers 53 appliances (16 classes) in a 3-phase power grid in Germany. The authors have released two versions of the data set: 1) BLOND-50 contains 213 days of measurements sampled at 50 kHz (aggregate) and 6.4 kHz (individual appliances), 2)BLOND-250 consists of the same setup: 50 days, 250 kHz (aggregate), 50 kHz (individual appliances). The data set is also described in more detail in the Scientific Data paper.
The BLUED data set contains high-frequency (12 kHz) household-level data from a single US household over a period of approximately 8 days. The data set also contains an event list of each time an appliance within the household changes state (e.g. microwave turns on). This data set was collected primarily for the evaluation of event based NIALM methods. The authors have also password protected access to the data set to keep track of its usage.
The COOLL dataset was released by researchers at the PRISME laboratory at the University of Orléans, which contains high-frequency from 12 different types of appliances. Similar to the tracebase and PLAID datasets, multiple instances of the each type were measured, and each instance was measured throughout 20 operations. During each controlled operation, current and voltage data was collected at a sample rate of 100 kHz. The dataset is summarised in an academic paper, and can be downloaded from github after filling in a registration form.
A building-level office environment dataset of typical electrical appliances (BLOND)
Thomas Kriechbaumer & Hans-Arno Jacobsen of The Technical University of Munich (TUM) recently released the BLOND data set, which contains voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The study covers 53 appliances (16 classes) in a 3-phase power grid in Germany. The authors have released two versions of the data set: 1) BLOND-50 contains 213 days of measurements sampled at 50 kHz (aggregate) and 6.4 kHz (individual appliances), 2)BLOND-250 consists of the same setup: 50 days, 250 kHz (aggregate), 50 kHz (individual appliances). The data set is also described in more detail in the Scientific Data paper.
Building-Level fUlly labeled Electricity Disaggregation dataset (BLUED)
The BLUED data set contains high-frequency (12 kHz) household-level data from a single US household over a period of approximately 8 days. The data set also contains an event list of each time an appliance within the household changes state (e.g. microwave turns on). This data set was collected primarily for the evaluation of event based NIALM methods. The authors have also password protected access to the data set to keep track of its usage.
Controlled On/Off Loads Library dataset (COOLL)
The COOLL dataset was released by researchers at the PRISME laboratory at the University of Orléans, which contains high-frequency from 12 different types of appliances. Similar to the tracebase and PLAID datasets, multiple instances of the each type were measured, and each instance was measured throughout 20 operations. During each controlled operation, current and voltage data was collected at a sample rate of 100 kHz. The dataset is summarised in an academic paper, and can be downloaded from github after filling in a registration form.
Dataport database (formerly WikiEnergy)
Pecan Street Inc have released a large amount of domestic electricity data via the Dataport initiative At the time of writing, the data contains data from 669 homes, in which both the household aggregate power demand and individual appliance power demands are monitored at 1 minute intervals. The installations began in January 2011, and data is still being collected for most buildings. The data is freely available to University members of the WikiEnergy community, and full details for database access can be found on the Dataport homepage.
Domestic electricity demand dataset of individual appliances in Germany (DEDDIAG)
Marc Wenninger, Andreas Maier & Jochen Schmidt have released DEDDIAG, a domestic electricity demand dataset of individual appliances in Germany. The data set contains recordings from 15 homes over a period of up to 3.5 years, in which 50 appliances have been recorded at a frequency of 1 Hz. The data set focuses on appliances of significance for load-shifting purposes, such as dishwashers, washing machines and refrigerators. One home also includes three-phase mains readings that can be used for disaggregation tasks. Additionally, DEDDIAG contains manual ground truth event annotations for 14 appliances, that provide precise start and stop timestamps. The authors have also released source code of the data collection system, as well as a python command line tool for loading the data.
Dutch Residential Energy Dataset (DRED)
Delft University of Technology (TUDelft) have released DRED dataset, which contains both house level and appliance energy consumption information. The live deployment consists of several sensors measuring electricity, occupants occupancy and ambient parameters in a household. The DRED dataset includes electricity data (aggregated energy consumption and appliance level energy consumption), ambient information (room-level indoor temperature, outdoor temperature, environmental parameters), occupancy information (room-level location information of occupants, WiFi and BT RSSI information for localization) and household information (house layout, number of appliance monitored, appliance-location mapping etc). The dataset is publicly available and can be obtained from the DRED website.
Electricity Consumption & Occupancy data set (ECO)
The ECO data set is a data set for non-intrusive load monitoring and occupancy detection research. It was collected in 6 Swiss households over a period of 8 months. For each of the households, the ECO data set provides 1 Hz aggregate consumption data (current, voltage, and phase shift for each of the three phases in the household) and also 1 Hz plug-level data measured from selected appliances. In addition, the data set also includes occupancy information measured through a tablet computer (manual labelling) and a passive infrared sensor (in some of the households). The data set is described in detail in a paper published at BuildSys 2014.
The GREEND data set was released by a collaboration between researchers at the Alpen-Adria-Universität Klagenfurt and WiTiKee s.r.l. The data set contains active power measurements taken at 1 second intervals of 9 individual appliances and the household aggregate power demand from 9 houses in Italy and Austria, over a period of up to one year. Further details can be found in the accompanying arXiv paper. In addition, a NILMTK converter is also available for the data set.
Household Electricity Use Study (HES)
In 2012, the UK Energy Savings Trust, Department of Energy and Climate Change, and Department for Environment, Food and Rural Affairs published a 15 page report called Powering the Nation. This report summarises the full 600 page Household Electricity Use Study, which aimed to better understand how electricity is consumed in UK households. As part of this study, 251 owner-occupier households were monitored across England between April 2010 and April 2011. Of these households, 26 were monitored for 12 months, and 225 were monitored for 1 month. For each household, the energy consumption of 13-51 appliances was monitored at 2 minute intervals. A software portal is currently under development to provide access to the data set, although in the meantime the data can individually requested from ICF International by contacting efficient.products@icfi.com and providing a postal mailing address and operating system details.
IDEAL Household Energy Dataset
Released by researchers at the University of Edinburgh, the IDEAL Household Energy Dataset comprises data from 255 UK homes. Alongside electric and gas data from each home the corpus contains individual room temperature and humidity readings and temperature readings from the boiler. For 39 of the 255 homes more detailed data is available, including individual electrical appliance use data, and data on individual radiators. Sensor data is augmented by anonymised survey data and metadata including occupant demographics, self-reported energy awareness and attitudes, and building, room and appliance characteristics.
Indian Dataset for Ambient Water and Energy (iAWE)
The Indraprastha Institute of Information Technology recently released the iAWE data set, which contains aggregate and sub-metered electricity and gas data from 33 household sensors at 1 second resolution. The data set covers 73 days of a single house in Delhi, India. Each individual channel of the data can be downloaded separately in either SQL or CSV format from the download section at the bottom of the webpage.
Individual household electric power consumption Data Set
EDF Energy released a data set in 2012 containing energy measurements made at a single household in France for a duration of 4 years. Average measurements are available at 1 minute resolution of the household aggregate active power, reactive power, voltage and current, as well as the active power of 3 sub-metered circuits. Although each circuit contains a few appliances, this is the largest data set in terms of duration of measurement. The complete data set is openly available from the UCI Machine Learning Repository.
Pecan Street Research Institute (no longer available)
Pecan Street Research Institute announced the release of a new data set designed specifically to enable the evaluation of electricity disaggregation technology. A free sample data set is available to members of its research consortium, which has now been opened up to university researchers. The sample data set contains 7 days of data from 10 houses in Austin, TX, USA, for which both aggregate and circuit data is also available containing power readings at 1 minute intervals. In addition to common household loads, 2 of the houses also have photovoltaic systems and 1 house also has an electric vehicle.
Reference Energy Disaggregation Dataset (REDD)
REDD contains both household-level and circuit-level data from 6 US households, over various durations (between a few weeks and a few months). Each house has two-phase mains input, and 10-25 individually monitored circuits. High-frequency (kHz) current and voltage data are available for both mains circuits, while low-frequency power measurements (3-4 second intervals) are available for the appliance circuits. This data set was collected primarily for the evaluation of non-event based NIALM methods. The authors have password protected access to the data set to keep track of its usage.
REFIT Electrical Load Measurements dataset
The REFIT data set was released as part of the Smart Home and Energy Demand Reduction project, by David Murray and Lina Stankovic at the University of Strathclyde. The data set contains active power measurements of the aggregate and 9 individual appliances from 20 homes in the Loughborough area of the UK, at a resolution of 1 sample every 8 seconds. This makes the REFIT the only UK data set which contains appliance level data at a sample rate great than once per minute. In addition, aggregate gas consumption data was also recorded at 30 minute intervals. However, it should be noted that the data was compressed by removing samples for which the power demand had not changed since the last reading. Further details can be found in a presentation from the EEDAL 2015 conference, a detailed technical report, and the dataset readme file. In addition, a NILMTK converter is also available for the data set.
Smart* Home Data Set (via the UMassTraceRepository)
Although not collected specifically for energy disaggregation, the Smart* (Smart Star) data set provides power data from 3 thoroughly sub-metered real households. The granularity of data collected for circuit level monitors (premises aggregate and individual circuits) is one reading per second, while individual plug loads are measured roughly every few seconds. Each house contains 21-26 circuit meters and almost all appliances are measured using plug meters. At the moment, aggregate, circuit and appliance data is available for house A, but only aggregate data is available for houses B and C.
Tracebase
The tracebase repository contains individual appliance data with the intention of creating a database for training NIALM algorithms. The repository contains a total of 1883 days of power readings, taken at 1 second intervals, for 158 appliance instances, of 43 different appliance types. Since the aim is to create an appliance database, no aggregate measurements are collected. The data is introduced in Reinhardt et al. 2012 and is available from the tracebase repository. The files are password protected, but a password can be requested via the download page.
UK Domestic Appliance-Level Electricity (UK-DALE) dataset
Jack Kelly released the first version of the UK-DALE in January 2015. The data set contains 16 kHz current and voltage aggregate meter readings and 6 second sub-metered power data from individual appliances across 3 UK homes, as well as 1 second aggregate and 6 second sub-metered power data for 2 additional homes. An update to the data set was released in August 2015 which has expanded the data available for house 1 to 2.5 years. Low frequency data is available to download in CSV or NILMTK HDF5 format, while high frequency data can be downloaded in FLAC file format.
As always, please leave a comment if you have released your own data set or know of someone who has. Also, if you notice any errors or updates please let me know. I'll do my best to keep this list up to date!
Friday, 25 May 2012
NIALM in industry
Since the Pittsburgh workshop I've learnt a lot about industry's perspective of NIALM. Academic work is often criticised for solving simplified problems, and I believe it's important to understand what our friends in industry believe to be the real-world problems. This post is meant to list the companies I'm aware of that are working in this field, and summarise their approach to energy disaggregation. As always, please leave a comment or drop me an email if you know of any inaccuracies or omissions.
AlertMe (now part of Centrica Hive), Cambridge, UK
AlertMe is a company focusing on household monitoring for energy reduction and security purposes. Their Analytics package is able to use second by second measurements to disaggregate the whole-home data to identify individual appliances, analyse their performance and provide personalised feedback and recommendations.Bidgely (formerly MyEnerSave), CA, USA
Bidgely is a meter-agnostic cloud-based electricity disaggregation company focused generating actionable appliance-level feedback from second to minute level aggregate power data. Their web interface allows users to either connect third party meters (TED, WattVision etc.). In addition, customers are able to upload data collected from smart meters to Bidgely's platform for analysis.EEme (now part of Uplight), NC, USA
EEme are a spin-out from Carnegie Mellon University, who apply energy disaggregation to 15-minute smart meter data to provide demand side management analytics. EEme released a report describing the accuracy of disaggregation product as calculated by Pecan Street’s 3rd party evaluation tool. The evaluation tool which provided EEme with 15-minute aggregate smart meter data and weather data from hundreds of homes, and required EEme to return monthly totals for four of the largest energy consuming appliances.Fludia, Paris, France
Fludia is a French company specialising in energy management, and aim to provide their customers with energy efficiency technology. They have developed a device to retrofit non-smart meters, called Fludiameter, such that 1 minute resolution energy data can be collected without installing a whole new meter. Fludia also provide a tool to break down electricity consumption into its end uses, called Beluso, which makes use of household aggregate data and also information entered from a household survey.Homepulse (formerly WattGo), Aix-en-Provence, France
Homepulse is a French company whose technology disaggregates electricity and gas consumption in-near real-time into categories, such as standby, cold appliances, hot water, home appliances and heating. The company also released a whitepaper detailing the aggregate monitoring of thousands of households along with the collection of a range of metadata, which forms the training data for their algorithms.Informetis, Japan
Informetis are a spin out of Sony R&D, whose technology offers both historic appliance-specific energy breakdowns and real-time disaggregation. Furthermore, their product aims to detect abnormal power consumption of individual appliances. The product currently works with either a custom plug-in sensor, or a smart meter running custom firmware. In July 2015, Informetis expanded the company by opening a second office in Europe, operating out of Cambridge in the UK.LoadIQ, NV, USA
LoadIQ are a company focusing on electricity disaggregation within commercial and industrial domains. The company offers solutions aimed at reducing business' costs as a result of inefficient electricity consumption.Navetas, Oxford, UK
Navetas is a spin-out company from the University of Oxford who have partnered with a meter manufacturer to focus upon disaggregating appliance energy consumption given the high-resolution aggregate data.Neurio (from Energy Aware), Vancouver, BC, Canada
Neurio recently raised Kickstarter funding to develop their electricity disaggregation technology, which consists of a hardware sensor featuring two CT clamps, capable of reporting voltage, current, real power and power factor at 1 second intervals. The company also offer a cloud-based service which breakdown your household electricity usage into individual appliances.Onzo, London, UK
Onzo is a London-based energy analytics company, who recently sold their energy display business to SSE Labs, allowing them to concentrate solely on data analytics. Their website boasts a proprietary energy knowledgebase containing tens of thousands of household's energy data at a range of resolutions, as well as thousands of energy signatures from individual appliances from a range of manufacturers. Furthermore, their disaggregation algorithms not only infer the energy consumption of appliances, but also determine household occupancy schedules and appliance diagnostic information.PlotWatt, NC, USA
PlotWatt are a meter-agnostic cloud-based electricity disaggregation company focused on disaggregating second to minute level aggregate power data. Their web interface allows users to either connect third party meters (TED, WattVision etc.) or upload files of their consumption data.Powersavvy, Castlebar, Ireland
Powersavvy is a company looking to highlight energy savings to both households and businesses. They offer their own meter, which can be installed for either 6 days or indefinitely, and is used to provide disaggregated advice based on the data they collect. Their website quotes that savings of 30% can be easily achieved using their products.
Sense, MA, USA
Sense are a consumer-oriented startup based in Cambridge, Massachusetts, doing sub-second level monitoring of current and voltage through 2 current sensors attached to the service mains in the electric panel. On 01.09.16 Sense received $14M in series A funding to grow the business.
SmartB (from Yetu), Berlin, Germany
SmartB are the energy disaggregation arm of Yetu, who offer a smart home platform which connects household appliances, micro-generators, micro-storage and smart meters via a home gateway. Their home energy management system allows household occupants to view their live or historic household aggregate power demand. Furthermore, they offer a software-based disaggregated breakdown of this 1 second smart meter data. The system also notifies the household occupants if an appliance has consumed significantly more energy than the national average, and offers personalised suggestions for saving energy.Verdigris, CA, USA
Verdigris is a silicon valley start-up offering Building.AI, a platform for building intelligence. The product consists of a number of circuit panel CT clamps and a disaggregation software package. As a result, they're able to provide appliance-level energy breakdowns, real-time fault detection and persistent building commissioning.Verlitics, (formerly Emme), OR, USA
Verlitics is a cloud-based company which use bespoke high sample rate meters to provide electricity disaggregation results to domestic customers. They offer web and smartphone interfaces to their home-owners or businesses allowing them to view their disaggregation electricity usage.Watt-IS, Torres Vedras, Portugal
Watt-IS are an analytics company which aim to disaggregate smart meter data collected by utilities to produce appliance-level energy consumption data and actionable feedback which could be provided to customers. Such feedback includes the potential savings from replacing a refrigerator, reducing the whole-home standby power, and also shifting demand to off-peak times.
Wattics, (formerly Veutility), Dublin, Ireland
Wattics is a software company, partnered with EpiSensor, who have previously provided disaggregated appliance level data from a single point of measurement. Their on-line dashboard identified unneeded or deteriorating appliances and suggested energy saving measures. However, Wattics not longer offer disaggregation functionality.
Wattseeker (from Qualisteo), Nice, France
Wattseeker offer a datalogger, which includes a number of current clamps and options to upload data via 3G, Ethernet or Wi-Fi. These current clamps must sample the current and voltage at a kHz rate since real and reactive power are reported, along with harmonics etc. However, the installation does require a short shutdown of the building's power. Their disaggregation system, LYNX, then disaggregates the electricity consumption to provide actionable energy saving suggestions. Their website indicates that each current clamp can disaggregate up to 12 appliances, with an accuracy of +/- 2%.Watty, Stockholm, Sweden
Watty is a startup company closely linked with the KTH Royal Institute of Technology. Rather than focusing specifically on energy disaggregation, Watty is an energy analytics company that focuses on producing the insight required to save energy and money for specific buildings.
Wednesday, 9 May 2012
Post first NILM workshop thoughts
I've just got back to the UK after attending the 1st International Workshop on Non-Intrusive Load Monitoring in Pittsburgh, USA. First of all, I'd like to congratulate the organisers, Mario Bergés and Zico Kolter, for putting together such a great programme. By my guesses, there must have been 40-50 attendees, from academia, industry and other nonprofit organisations, which in my opinion well represented the research in this field to date.
I particularly enjoyed Sidhant Gupta's very well prepared video and video-conference link up from CHI 2012 in Austin. Sidhant's work takes quite a different perspective to most of the other work we saw at the workshop, since it uses the high frequency electromagnetic interference generated by appliances' switch mode power supplies as the basis for disaggregation. If you weren't at the workshop, you can find a small portion of the video he showed linked to from his website.
The topic of data set availability kept coming up throughout the day. I got the distinct impression that the academic community was crying out for more data, while the industry folks were unsure of how they could open up their own data sets to push the community forward. In the panel session, Zico raised the point that people shouldn't wait to perfect their data set before releasing it. This spurred Mario to release a data set from his home, and hopefully many more will emerge in the near future.
With REDD and Mario's data, that makes 2 data sets out in the wild for benchmarking NIALM methods. If I start to see any other data sets appear, I'll start blog post to help people keep track of them. I'll do my best to keep it up to date, but please email me or comment if you notice any mistakes or omissions.
I particularly enjoyed Sidhant Gupta's very well prepared video and video-conference link up from CHI 2012 in Austin. Sidhant's work takes quite a different perspective to most of the other work we saw at the workshop, since it uses the high frequency electromagnetic interference generated by appliances' switch mode power supplies as the basis for disaggregation. If you weren't at the workshop, you can find a small portion of the video he showed linked to from his website.
The topic of data set availability kept coming up throughout the day. I got the distinct impression that the academic community was crying out for more data, while the industry folks were unsure of how they could open up their own data sets to push the community forward. In the panel session, Zico raised the point that people shouldn't wait to perfect their data set before releasing it. This spurred Mario to release a data set from his home, and hopefully many more will emerge in the near future.
With REDD and Mario's data, that makes 2 data sets out in the wild for benchmarking NIALM methods. If I start to see any other data sets appear, I'll start blog post to help people keep track of them. I'll do my best to keep it up to date, but please email me or comment if you notice any mistakes or omissions.
Thursday, 29 March 2012
Paper accepted at AAAI on NIALM training
I recently received notification that my paper titled Non-intrusive Load Monitoring using Prior Models of General Appliance Types has been accepted at AAAI-2012. The paper will appear in the Computational Sustainability for AI track, for which I will give an oral and poster presentation at the conference. The abstract for the paper is below:
Non-intrusive appliance load monitoring is the process of disaggregating a household's total electricity consumption into its contributing appliances. In this paper we propose an approach by which individual appliances can be iteratively separated from an aggregate load. Unlike existing approaches, our approach does not require training data to be collected by sub-metering individual appliances, nor does it assume complete knowledge of the appliances present in the household. Instead, we propose an approach in which prior models of general appliance types are tuned to specific appliance instances using only signatures extracted from the aggregate load. The tuned appliance models are then used to estimate each appliance's load, which is subsequently subtracted from the aggregate load. This process is applied iteratively until all appliances for which prior behaviour models are known have been disaggregated. We evaluate the accuracy of our approach using the REDD data set, and show the disaggregation performance when using our training approach is comparable to when sub-metered training data is used. We also present a deployment of our system as a live application and demonstrate the potential for personalised energy saving feedback.
Full details can be found on my publications page.
Monday, 19 March 2012
1st International Workshop on Non-Intrusive Load Monitoring
Mario Bergés recently pointed me towards this really exciting workshop on Non-Intrusive Load Monitoring:
http://www.ices.cmu.edu/psii/nilm/agenda.html
Some important information:
http://www.ices.cmu.edu/psii/nilm/agenda.html
Some important information:
- When: May 7th 2012
- Where: Pittsburgh, PA, USA
- Objective: Unite researchers from a variety of backgrounds working on Non-Intrusive Load Monitoring
The agenda contains overview talks about the state of the art and also some spotlights from industry partners. There's also an afternoon poster session to present and gain feedback on any ongoing work.
Overall, I think the event will be a great opportunity of learn about current research in both academia and industry, and I would thoroughly encourage anyone working in this area to consider attending. I'm planning to present some of my current work at the non-event based poster session, and hope you see some of your work at the workshop too!
Friday, 27 January 2012
Unsupervised learning for NIALM
I've recently been thinking about various training methods for NIALM systems, specifically those which can be applied to unlabelled aggregate power demand data sampled once per minute (or less frequently). Assuming no prior information of the appliances or their usage patterns, this clearly falls into the category of unsupervised learning.
In unsupervised learning, the goal is often to determine the unknown structure of unlabelled data. However, in our case we don't simply want to construct a model which represents the aggregate power data. In fact, we want to build a model of the data in which appliances are explicitly represented. This way, once the learning process is complete, we can form the disaggregation task as an inference problem.
Previous unsupervised approaches to this problem have used clustering to identify unique behaviour of appliances. These approaches have been shown to work well when applied to multiple features extracted from high granularity data (sampled at kHz). However, in the case of low granularity data, there is no way to extract features such as reactive power, power factor, etc. and we are instead left with a single feature; (real) power.
To give a visual representation of how clustering might perform on real aggregate data sampled at 1 minute intervals, I ran some experiments on the REDD dataset. To do so, I did the following:
In unsupervised learning, the goal is often to determine the unknown structure of unlabelled data. However, in our case we don't simply want to construct a model which represents the aggregate power data. In fact, we want to build a model of the data in which appliances are explicitly represented. This way, once the learning process is complete, we can form the disaggregation task as an inference problem.
Previous unsupervised approaches to this problem have used clustering to identify unique behaviour of appliances. These approaches have been shown to work well when applied to multiple features extracted from high granularity data (sampled at kHz). However, in the case of low granularity data, there is no way to extract features such as reactive power, power factor, etc. and we are instead left with a single feature; (real) power.
To give a visual representation of how clustering might perform on real aggregate data sampled at 1 minute intervals, I ran some experiments on the REDD dataset. To do so, I did the following:
- Down sampled all data to 1 minute resolution
- Subtracted the power of each circuit from the household mains circuit to calculate the unallocated, or 'unknown', power
- Calculated the difference between consecutive power readings for each circuit
- Excluded any change in power less than 100 W
- Counted the power differences into bins for each circuit
- Plotted these bins as a stacked bar graph for each household
As an example, here's the chart for house 1:
You might want to click on the image to enlarge it since the inline resolution isn't great.
There are two key points to take from this plot:
- There are two unique clusters at the higher end of the power axis (labelled washer dryer and oven I think). These clusters would be easily identified by a clustering algorithm due to their clear separation from the other appliances.
- There are two clusters around the 1500 W mark (corresponding to the microwave and kitchen outlets I think). One cluster completely subsumes the other, making it very difficult or even impossible for a clustering algorithm to separate the two.
This is just one example, and although the appliances and their usage will be different across houses, I believe this trend will continue. There's always likely to be appliances with high power demands that are easily clustered, however, for appliances with lower power demands the corresponding clusters are increasingly more likely to overlap.
Although at first glance this might seem okay, because we're more interested in the appliances that consume the most energy. However, power demand and energy consumption are not always correlated. This is because power demand represents the rate of energy consumption, and therefore energy consumption depends of both the appliance's power demand and its duration of use. Two examples of appliance types with low power demands but high energy consumptions are the refrigerator and lighting. Because these appliances are on for such a long time, their energy consumption might turn out to be similar or even greater than kitchen white goods with the highest power demands.
I also generated the graphs for the other 5 houses in the data set, which I've included below (click to enlarge):
Labels:
appliance power,
clustering,
unsupervised learning
Subscribe to:
Posts (Atom)