Friday 6 December 2013

Accuracy metrics for NIALM

Accuracy metrics are essential when evaluating the performance of an energy disaggregation algorithm in a given scenario. However, each paper seems to use a different metric when comparing their proposed approach to the state of the art. As a result, it is impossible to compare numerical results between papers. I don't believe this is the fault of the authors, since there is no single accuracy metric which is unquestionably better than all other accuracy metrics. Instead, the relevance of each metric depends largely on the intended use of the disaggregated data.

For example, if the use of disaggregated data is to be used to provide a breakdown of the energy consumption in a home, an accuracy metric which allows errors to cancel out over time would be suitable. However, if the use of the disaggregated data is to be used to suggest appliance loads to be deferred to a different time of day, a less forgiving accuracy metric would be required.

Therefore, inspired by discussions at the EPRI NILM 2013 workshop and in my recent involvement in the foundation of an open source disaggregation toolkit, I have decided to collect and categorise a list of commonly used accuracy metrics as shown below.

Event based metrics


Event based metrics assess how well a disaggregation algorithm detects appliance change events (e.g. washing machine turns on). However, it is not trivial to determine appliance events from sub-metered power data to be used as ground truth, and as a result often involve some subjective judgement (e.g. should a washing machine changing state mid-cycle from spin to drain constitute an event?). Furthermore, deciding whether a detected event matches a ground truth event is also not trivial (e.g. should a detected event that is 1 second apart from a ground truth event be matched?).

Non-event based metrics


Non-event based metrics assess how well a disaggregation algorithm is able to infer the power demand of individual appliances over time. As such, it is highly dependent upon the sampling rate of the sub-metered appliance data used as the ground truth. Such metrics have the advantage that sub-metered appliance data is easily collected by hardware installations, and requires little subjective judgement. However, non-event based metrics suffer from the disadvantage that disaggregation algorithms can score very highly by predicting all appliances to always draw zero power. This occurs as a result of most appliances remaining off for the majority of each day, and therefore the disaggregation algorithm is able to correctly predict each appliances power for the majority of each day.

Overall metrics


Overall metrics assess how well a disaggregation algorithm is able to infer the total energy consumed by individual appliances over a period of time. Such metrics are often the most intuitive, since they directly correspond to the pie chart of household energy consumption (e.g. as provided by Neurio). However, overall metrics allow errors to cancel out over time (e.g. an appliance's power is overestimated on day 1, while it is underestimated on day 2, resulting in the algorithm being assigned 100% accuracy since these errors cancel each other out).


This list is mostly intended as a starting point for discussion regarding accuracy metrics, so please leave a comment if you notice any metrics I've left out!

No comments:

Post a Comment

Note: only a member of this blog may post a comment.