To date, the Dataport data has been available via direct access to the database. While this provides an efficient means of querying a small amount of data, large amounts of data can take a long amount of time to download since the data is transferred in an uncompressed format. Furthermore, as with most other data sets, the data set is described in a custom format, requiring researchers to parse the data and metadata before making use of the data set.
For these reasons, we've been working with Pecan Street Inc to release a subset of the Dataport database in NILMTK HDF5 format. The HDF5 file is 1.09 GB in size, and contains one month of data from 669 of the Dataport houses, which were selected as they contain at least 8 meters. In each house, the circuit name has been converted from the Dataport names to the NILM Metadata controlled vocabulary. The produced dataset can be easily analysed using tools described in the NILMTK documentation. The HDF5 file is available via the Dataport portal under the same access control as the Dataport database.
Below is a boxplot showing the proportion of energy consumed by each circuit category across all households in the HDF5 data.
The data set is described in more detail in the following paper to be presented at the GlobalSIP Smart Buildings workshop:
Oliver Parson, Grant Fisher, April Hersey, Nipun Batra, Jack Kelly, Amarjeet Singh, William Knottenbelt, Alex Rogers. Dataport and NILMTK: A Building Data Set Designed for Non-intrusive Load Monitoring. In: 1st International Symposium on Signal Processing Applications in Smart Buildings at 3rd IEEE Global Conference on Signal & Information Processing, Orlando, FL, USA, 14-16 December, 2015.