Thursday, 28 April 2011

Classification a single appliance's state using a hidden Markov model

The simplest scenario is one in which we wish to determine the state of a single two-state appliance, given its power demand. This can be represented as a hidden Markov model as follows:
where:
  • Zt - hidden (latent) variable at time t (appliance state)
  • Xt - observed variable at time t (power reading)
For a given set of observed variables, we can expand the possible chains of hidden variable states as a lattice diagram:
where:
  • k - appliance state
It can be seen from this diagram that the number unique chains of hidden variables through this lattice scales exponentially with the length of the chain. However, the max-sum algorithm (called the Vertibi algorithm in this case) can be used to find the probability of the most likely chain, with complexity that grows only linearly with the length of the chain. The values of the most likely chain can then be found by backtracking through the chain starting from the final time slice of the most likely chain.

The Viterbi algorithm involves calculating the joint probability of a chain of observed and hidden variables, given by:
where:
  • X - set of observed variables
  • Z - set of hidden variables
  • theta = {pi, A, phi} - model parameters
  • pi - probability density of z1
  • A - appliance transition probability matrix
  • phi - emission density (probability density of observed variable given a hidden variable)
  • N - number of time slices
The model parameters parameters can be learnt from the set of observed variables using the Expectation-Maximisation algorithm. However, we may want to introduce a prior belief over some parameters, or even fix some parameters. For example, we might know the state of the appliance at the start of the chain, and can therefore fix pi. We might also know the parameters governing the emission density, and if we assume the observations are normally distributed we might want to fix the mean and variance parameters of the Gaussian function.

Although this model can represent appliances with more than two states, it is unclear how tractable the EM algorithm will be for a factorial hidden Markov model. One solution would be to model the multiple Markov chains as a single Markov chain, whose number of states is equal to the number of combinations of appliance states. However, the complexity of such a model is then exponential in the number of appliances. It is worth noting that we can prune the number of possible transitions based on the assumption that only one appliance can change state between two consecutive time slices, which is a practical assumption assuming a small time interval between time slices. A solution to solving this might be to use sampling methods.

Friday, 15 April 2011

NIALM as a factorial hidden Markov model

After considering various heuristic approaches towards NIALM, it is clear that a more principled probabilistic approach is required. The approach would need to recognise that premise-level power readings are sequentially drawn from a continuous time series, and are not independent of each other. In fact, intuition tells us that in general appliances are far more likely to stay in their current state, and that state changes are comparatively rare. NIALM can also be considered as an online-learning problem, in which disaggregation must occur after each individual premise-level power value has been received. These two assumptions map directly on to the modelling of such a problem as a factorial hidden Markov model.

In a factorial hidden Markov model, there are multiple independent chains of hidden variables, which in this case are the current operating state of each appliance. At each time slice, there is also an observed variable conditional on the states of all of the corresponding hidden variables, which in the case is the premise-level power value. Below is a diagram of a factorial hidden Markov model.

where:

  • Z - hidden (latent) variable, subscript - time slice, superscript - variable number
  • X - observed variable, subscript, time slice
NIALM solutions are interested in evaluating the state of appliances, and given a look-up table of appliance state power values, can therefore evaluate the power of each appliance at each time slice. We are therefore interested in modelling the probability of an appliance being in a certain state, given the appliance's state in the previous time slice and the observed variable for that time slice:
This is an approach I will be investigating over the following weeks, after reading up on the relevant theory on sequential data and expectation-maximisation.

NIALM as a combinatorial optimisation problem - continued

As previously discussed, appliance loads can be disaggregated from premise-level power readings by the minimisation of a function f at each time slice:

where:

  • Pagg - premise-level aggregate power
  • N - number of appliances
  • I - vector of appliance binary indicators (1 if appliance is on, 0 otherwise)
  • P - vector of appliance power when appliance is on
Two commonly cited disadvantages of this approach is how its accuracy would decrease with the number of appliances and level of noise. However, no research has been published investigating the situations in which this approach performs sufficiently accurately. I have tested this approach using simulated data for 24 time slices, relating to one reading per hour over the course of a day. Below are tests evaluating the performance of this approach as the number of appliances is increased and as the noise in the data is increased.

Increasing numbers of appliances

Below shows a graph of this approach's true positive rate (TPR) and false positive rate (FPR) for increasing numbers of appliances, n. No noise was introduced over the simulated data set.


We can see that this approach performs very accurately from n = 1 to roughly n = 10. However, for n > 10 the TPR rapidly decays towards 0.5, while the FPR increases towards 0.25. This approach is clearly only useful for small n (roughly n < 20). However, it might be interesting to evaluate the performance of this method given only a small subset of appliances and the premise-level aggregate power. The appliances should be chosen such that they are the most power hungry appliances.

Increasing noise

To investigate how the accuracy of this approach is affected by noise, Gaussian noise was added to the premise-level power readings of the form:
where:
  • N - Gaussian distribution
  • x - Noisy premise-level aggregate
  • Pagg - actual premise-level aggregate
  • c - noise level
Below shows a graph of this approach's true positive rate (TPR) and false positive rate (FPR) for increasing noise in the aggregate power reading, c.


We can see that instantly after introducing some noise, the TPR rate drops to roughly 0.7 for Gaussian noise of standard deviation just 0.1% of the aggregate power level. It is clear that this approach is not robust to even small levels of noise in the premise-level aggregate power readings, as would be likely in a realistic setting.

Extensions

Two extensions are discussed below which could increase the accuracy of this approach.

First, the assumption that an appliance has only one state for which it can draw power can be relaxed. Instead, we allow appliances to operate in one of a number of K states, each of which has a unique power value. The appliance state indicator variable would therefore be a binary vector taking the form of a 1-of-K coding scheme, such that:
and
The optimal appliance combination would therefore be given by minimising the function f at each time slice:

Second, the assumption that appliance states at each time slice are independent of each other can be relaxed. This allows us to take into account information of known appliance lengths in the form of constraints, e.g. 10 minutes < fridge_cycle_duration < 20 minutes. However, this would significantly increase the complexity of the optimisation problem by including the additional time dimension.

Friday, 1 April 2011

NIALM as a combinatorial optimisation problem

NIALM can be formulated as a combinatorial optimisation problem as follows:

For each time instant, minimise the function f:
where:

  • Pagg - household aggregate power
  • N - number of appliances
  • I - vector of appliance binary indicators (1 if appliance is on, 0 otherwise)
  • P - vector of appliance power when appliance is on
This approach was first proposed by Hart, as explained in the 1992 seminal paper on NIALM. Hart asserts that this is an NP-complete "weighted set" problem, and mentions that the number of combinations of 'on' appliances scales exponentially with the number of appliances. However, despite its computational difficulty, the optimal solution can be found within seconds for a realistic number of appliances, e.g. 30.

I have recently been investigating the performance of this technique using simulated data. 24 hours worth of truth data was generated using 30 appliance profiles, each with realistic power demands and usage patterns. The true household aggregate was calculated by summing the power consumed by each device at that time instant. No noise was added on appliance nor household level.

I have been investigating how the performance of this technique decays as the number of appliances in the household increases. The optimisation function was applied at 1 hour intervals to 24 hours of data, and the accuracy was calculated as the average fraction of correct classifications.

Below is a graph showing the accuracy of this technique for the number of appliances in the house.


The technique is very accurate when there is less than 10 appliances in the household. Although the accuracy starts to drop rapidly, if seems to converge around 0.5 or 0.6. However, further tests of large numbers of appliances are necessary to prove this.

Obvious future work is to investigate how the performance of this approach decays with the introduction of appliance and aggregate level noise to the data.