Thursday, 25 February 2016

Is deep learning the future of NILM?

Deep learning has recently revolutionised a number of well-studied machine learning and signal processing problems, such as image recognition and handwriting recognition. Furthermore, long short-term memory architectures have demonstrated the effectiveness of applying recurrent neural networks to time series problems, such as speech synthesis. In addition to the impressive performance of these models, the elegance of learning features from data rather than hand crafting intuitive features is a highly compelling advantage over traditional methods.

In the past year, deep learning methods have also started to be applied to energy disaggregation. For example, Jack Kelly demonstrated at BuildSys 2015 how such models outperform common disaggregation benchmarks and are able to generalise to previously unseen homes. In addition, Lukas Mauch presented a paper at GlobalSIP 2015 describing how sub-metered data can be used to train networks to disaggregate single appliances from a building's total load. Most recently, Pedro Paulo Marques do Nascimento's master's thesis compared a variety of convolutional and recurrent neural networks across a number of appliances present in the REDD data set. Each piece of research demonstrates that there's real potential to apply deep learning to the problem of energy disaggregation. 

However, two critical issues still remain. First, are the huge volumes of sub-metered data available which are required to train such models? Second, are the computational requirements of training these models practical? Fortunately, training can be performed offline if only general models of appliance types are to be learned. However, if learning is required for each individual household, surely this will need to take place on cloud infrastructure rather than embedded hardware. I hope we'll get closer to answering these questions at this year's international NILM conference in Vancouver!

2 comments:

  1. > are the huge volumes of sub-metered data available
    > which are required to train such models?

    I believe so, yeah. But you need to use data augmentation to increase the effective size of the training dataset. And we're quick lucky in the NILM world because we can synthesise a large (effectively infinite?) amount of synthetic aggregate data by randomly combining real sub-metered data. Let me show you the effect of the data augmentation:

    In this experiment, I'm training a 'rectangles' network to recognise washing machines. The network is trained on a 50:50 mix of real aggregate data and synthetic aggregate data. Take a look at this plot of the train and validation costs during training:

    http://www.doc.ic.ac.uk/~dk3810/neuralnilm/e567_washing%20machine_rectangles_costs.png

    The green line is the validation cost. The red dots are train costs for synthetic aggregate data. The synthetic data is generated on-the-fly and never repeats (probably). The blue dots are train costs for real aggregate data. Note that I'm dicking around with the learning rate at several stages in the experiment (hence the change in the variance in the validation costs). But also note that the validation costs don't increase (ignoring the high variance 'flukes' between 150,000 - 200,000 iterations).

    In this next plot, we see the exact same network being trained again except that I switch off the synthetic aggregate data at 150,000 iterations:

    http://www.doc.ic.ac.uk/~dk3810/neuralnilm/e567_washing%20machine_rectangles_costs_train_on_real_data_only_after_150000_iterations.png

    Note that the network clearly starts to over-fit (note the increase in the validation costs at 150,000 iterations).

    And, of course, all my experiments were trained using only UK-DALE. And there are loads more labelled datasets out there. And it might be possible to pre-train nets on unlabelled data. And there are a bunch of other tricks (like batch normalisation) which further reduce the requirement for lots of data. And I imagine there's scope to pre-train nets on all appliances and then fine-tune one net per appliance.

    So, yes, I'm not too worried about having enough training data.

    > Second, are the computational requirements of training
    > these models practical? Fortunately, training can be
    > performed offline if only general models of appliance
    > types are to be learned. However, if learning is required
    > for each individual household, surely this will need to
    > take place on cloud infrastructure rather than embedded
    > hardware.

    I was imagining that, as you say, you'd train models offline on as much training data as you can get your hands on (perhaps both labelled and unlabelled). And then you just have to do inference for each test house. Inference is reasonably fast, especially on a GPU. You could perhaps just about do inference on an embedded device, as long as it has several hundred MBytes of RAM and storage. But I'm pretty convinced that NILM should be done in the cloud for a whole bunch of reasons.



    ReplyDelete
  2. Onzo are now using deep learning for electricity disaggregation: http://www.onzo.com/self-driving-cars-and-smart-home-energy-use%E2%80%8A-%E2%80%8Awhat-do-they-have-in-common/

    ReplyDelete