Friday, 5 April 2013

Why NIALM shouldn't be modelled as the knapsack or subset-sum problem

In his seminal work, Hart (1992) highlighted the similarities between the appliance disaggregation problem and some well-studied combinatorial optimisation problems. He stated that if the power demand of each appliance is known, the disaggregation problem can be modelled as an instance of the subset sum problem. This approach selects the set of appliances whose power demands sum to the household aggregate power, applied independently to each aggregate power reading. However, Hart identified a core problem of this approach:
  • Small fluctuations in the aggregate power lead to solutions with unrealistically high numbers of appliance switch events.
To confirm this in practice, I evaluated this formulation using simulated data. My simulations showed that this approach performs poorly for realistic numbers of appliances even with ideal conditions of no noise and only on-off appliances. Clearly, this is not a good model for appliance disaggregation.

More recently Egarter et al. (2013) extended this model, stating that if both the power demand and duration of usage of each appliance are known, the disaggregation problem can be modelled as a two-dimensional knapsack problem. This approach is similar to Hart's proposed method, with the exception that each aggregate measurement is not considered as an individual optimisation problem. Instead, appliance operations must persist for the exact given duration. However, I see two major problems with this approach:
  1. Optimal solutions might give unrealistic results. As the baseload power increases, I'd expect the number of solutions (those that produce a small difference between the sum of appliance power and the aggregate power) to increase rapidly. As a result, I seriously doubt that the actual optimal solution would give the best disaggregation accuracy.
  2. Appliance durations are often highly variable. Unlike power demand, their duration of usage is often determined by the user. However, extending this formalism such that appliances could have a variable duration would hugely increase the size of the solution space.
Although these approaches show promise in simple scenarios, I don't believe they capture the flexibility required by the energy disaggregation scenario. Instead, I believe probabilistic approaches are more appropriate, given their ability to reason over different solutions by estimating the likelihood of different event sequences.


  1. Very interesting, thank you for highlighting this paper (and their blog). It's an interesting approach. I have a lot of sympathy with the idea that we should try to take advantage of state duration information where appropriate. But I completely agree with your critiques of this paper.

    As you say, appliance durations are highly variable. Even for appliances whose state durations fall within a relatively tight range (like toasters and kettles) the duration can change by around +/- 50%. And appliances which, at first glance, appear to follow a strict program (like a washing machine) show variation in their state durations (for example, a washing machine's heater only remains on long enough to bring the water temperature up to a set point... and that duration depends on a number of external factors). And, then, of course there are all the appliances whose state durations vary massively (lights, TVs, computers etc etc)

    They say in their "Future work" section that

    "The current version of the algorithm was tested for on/off appliances with constant time duration. In future work, we plan to extend our algorithm for arbitrary shapes of power pro les and to evaluate the approach using real appliances"

    So it'll be interesting to see how well their system works on real data. TBH, it's a shame they didn't evaluate their approach using any real data in this first paper. Granted, gathering your own real data is a PITA, but - as you've suggested in the past - it's easy to synthesise very realistic data using TraceBase

    (by the way, I really value your blog posts on recent NIALM papers. It's like an on-line NILM reading group ;) )

  2. Thanks for discussing our paper in your blog :)
    At the current state we have, as discussed, a strong abstraction of real appliances, but we are already preparing a sophisticated model, where improvements of the model and optimization approach are shown. If I have news I'll let you know.

    1. Hi Dominik, that sounds really interesting, thanks for the update! Are you planning to evaluate your approach using real data? I'd love to see a discussion of the advantages/disadvantages using the REDD data set.

      Also, I had a thought while reading your paper which I'd be interested to hear your opinion on. You model disaggregation as a knapsack problem, in which the profit of each appliance is 1. My understanding is that the optimisation will therefore favour solutions of large numbers of low energy appliances rather than fewer high energy consuming appliances. Did you observe this behaviour in your experiments?

    2. Hi Oliver, for sure we are planning to evaluate our approach using real data, but up to now we have not decided which data set to use.

      Concerning your question, we did not observed this behavior but we also did not evaluated in detail if the algorithm is dependent on the size of the appliance power. But we are planning to evaluate this in future work. In the current work the used appliances are generated randomly in predetermined limits. If we have presentable news I'll let you know!

  3. This comment has been removed by the author.