Meter-Based Methods from Coast to Coast

Demand Side Analytics (DSA) recently conducted two similar studies on the accuracy of using smart meter data for evaluation and settlement of energy efficiency (meter-based methods). Different localities refer to these meter-based methods with distinctive terminology. Across our two studies, California^[1] (for Pacific Gas & Electric) refers to these methods as normalized metered energy consumption (NMEC) while Vermont^[2] (for the Vermont Department of Public Service) refers to them as Advanced Measurement & Verification (M&V).

While these studies were distinct, they were both concerned, to varying degrees, with:

Estimating energy efficiency (EE) program impacts.
Assessing the accuracy of meter-based methods.
Providing recommendations for the types of populations, interventions, and locations where meter-based methods should be applied.

What follows is a review of the benefits of using meter-based methods for estimating EE program impacts, a brief overview of each study’s goals, and our findings.

What are Meter-Based Methods?

The primary challenge of estimating energy savings is the need to accurately detect changes in energy consumption due to the energy efficiency intervention, while systematically eliminating plausible alternative explanations for those changes. Did the introduction of energy efficiency measures cause a change in energy use? Or can the differences be explained by other factors (such as the effects of the COVID-19 pandemic)? To evaluate energy savings, it is necessary to estimate what energy consumption would have been in the absence of program intervention—the counterfactual or baseline.

Meter-based methods rely on whole-building, site-specific electric and/or gas consumption data, either at the hourly or daily level. This data is used to estimate energy savings associated with the installation of individual or multiple energy efficiency measures (EEMs) at the site.

Why rely on Meter-Based Methods?

Many methods exist to estimate savings associated with EEMs, all with varying degrees of modeling complexity, data requirements, accuracy, and precision. The benefits of using meter-based methods include:

Eliminating the need for sampling because data is available for nearly all participants.
Reducing the burden on participants because technicians don’t need to visit the home or business to install metering equipment.
Producing faster feedback on energy-saving performance.
Enabling program administrators to look beyond the average customer and explore how savings vary across segments of interest.
Opening new opportunities for program design and delivery (i.e., pay-for-performance programs).
Producing granular savings estimates that are useful for a wide range of planning and valuation functions.

California

Pacific Gas and Electric Company (PG&E) currently uses the CalTRACK Version 2.0 method (CalTRACK) to estimate avoided energy use for its energy efficiency programs based on the Population-Level NMEC methodology. A notable feature of the population NMEC method has been the lack of comparison groups, which are used to adjust the energy savings baseline and normalize the savings estimate for factors beyond weather. The pre-post method without a comparison group relies almost exclusively on weather normalization and effectively assumes that the only difference between the pre- and post-intervention periods is weather and the installation of EEMs. The COVID-19 pandemic laid bare the limitations of the adopted method. The pandemic led to changes in our commutes, business operations, and home use patterns. Not surprisingly, it has also changed how, when, and how much electricity and gas we use. Moreover, the impact on energy use differs for residential customers and various types of businesses.

Given the changes in energy consumption that have occurred over the course of the COVID-19 pandemic, the need for alternative approaches to CalTRACK and similar, simple pre-post regression methods for estimating EE impacts is paramount. While adding comparison groups typically improves the accuracy of these energy saving estimates, there are three main logistical challenges:

Privacy of non-participant customer data. Current California laws and regulation exist to protect the privacy of advanced metering infrastructure (AMI) or smart meter data for individual customers.
Transparency Challenges. Many evaluation methods that rely on a comparison group require extensive calculation in order to construct the group. This complexity can hinder independent review and/or replication of the findings.
Complexity and frequency. PG&E and third-party EE program implementers target a wide range of customer segments and geographic areas, each of which require regular and specifically targeted non-participant data for evaluation. This is a proposition that adds complexity to existing program administration processes.

To determine if there are viable alternative models that can accommodate the effects of the COVID-19 pandemic or other wide-scale non-routine events, DSA conducted an accuracy assessment of the existing Population NMEC methods as well as a variety of other methods with and without comparison groups.

What did we do?

Accurate and unbiased estimates of energy efficiency impacts are critical for utility program staff, third-party program implementers, and regulators. In evaluating the accuracy of the existing Population NMEC methods used in the PG&E territory, we tested a variety of other methods, with and without comparison groups, to simulate a competition and identify the methods that are unbiased and accurate (Figure 1).

The accuracy of these methods are assessed by applying placebo treatment on customers that did not participate in EE programs during the period analyzed. The impact of a program (or in this case, a pseudo-program) is calculated by estimating a counterfactual and comparing it to the observed consumption during the post-treatment period. Because no EEMs were installed in this simulation, any deviation between the counterfactual and actual loads is due to error. The process is repeated hundreds of times – a procedure known as bootstrapping – to construct the distribution of errors.

Figure 1: General Approach for Accuracy Assessment

What did we find?

Population NMEC methods without comparison groups cannot account for the effects of the COVID-19 pandemic
The existing population NMEC methods without comparison groups show upward bias even prior to the effects of the pandemic.
Comparison groups improve accuracy of the CalTRACK method.
When constructing a matched control group, the choice of segmentation and matching characteristics matter more than the method of matching customers.
Synthetic controls may perform well but are highly sensitive to the choice of segmentation used.
Using aggregated granular profiles instead of individual matched controls in Difference-in-Differences methods yields comparable results to using individual customer matched controls.
Accuracy and precision are dependent upon the number of sites aggregated together (Figure 2)

Figure 2: Distribution of Error across Comparison Groups

No method is completely free of error.

Given these findings, rather than try to produce a single prescriptive method for NMEC analyses of energy efficiency programs, we instead recommend a framework by which proposed NMEC methods can be tested, certified, and used to estimate savings:

	Certification needs to be implemented by an independent party. The party that develops a Population NMEC method cannot self-certify. We recommend that a party such as CALMAC or one of the National Laboratories be responsible for certification.
	Population NMEC methods need to be tested for reproducibility. Reproducibility means obtaining consistent computational results using the same input data, computational steps, methods, and code.
	Population NMEC methods must meet pre-defined input analysis dataset structures and pre-defined output structures. Defining the input data structure(s) ensures the method can be tested for reproducibility and also allows for an independent party to produce metrics for accuracy and precision. It also ensures that different NMEC methods can be applied to same datasets. At minimum, the input dataset must include AMI hourly data, public hourly weather data, and when energy efficiency measures where installed. Defining the output data structure allows utilities, vendor, and public entities to build dashboards and tools to display the results regardless of which underlying NMEC algorithm is applied.
	Population NMEC metrics of accuracy (bias) and precision should be calculated out-of-sample at a portfolio level. Metrics of accuracy measure the tendency to over or underestimate the baseline. They are used to assess if a method or model is biased. Metrics for precision measure how close individual hourly or daily estimates are to the actual answer and measure notice. While evaluators and other interested parties may choose to pick a model that is accurate and precise for individual sites, certification must be done on an aggregate program basis rather than for individual participants. Out of sample validation is critical since models that are over-fitted can perform well in sample and poorly out of sample.
	The measurement of accuracy (bias) and precision metrics should be calculated by the independent party certifying the method using a blind test. Using an independent party ensures consistency and independence of the metric calculations. We also recommend that test be a blind test, meaning that proponents of the Population NMEC method do not have access to dataset used to test the proposed method.
	To be certified, an NMEC method must meet specific criteria for accuracy and precision. Accuracy and precision, as noted above, are dependent upon the size of the participant population of interest. Therefore, targets for model acceptability are similarly size-dependent. Table 1 shows the proposed targets as a function of sample size. The metrics for bias and precision may need to be modified for sites with solar to account for the fact that large energy users can have lower energy consumption at the meter. These cutoffs were chosen on the basis of the bootstrapped accuracy test results and were set such that 15% of all models tested in this study met the criteria. Table 1: Proposed Out-of-Sample Accuracy and Precision Targets for Certification
	Population NMEC methods must be separately certified for residential, small and medium businesses, and large businesses and for sites with and without solar. The approach allows methods that work for specific segments to be applied.
	The out-of-sample metrics for accuracy and precision of Population NMEC methods tested for certification should be posted on a public repository such as CALMAC. Public data on the performance of different models is useful for helping develop new methods and avoiding redundant efforts.
	The code for estimating savings needs to be publicly available and include examples of how it is applied. A key goal of NMEC is transparency, which means everyone has access to the analysis code and examples for how to apply it to estimate savings.The code to estimate the savings needs to be in a standard statistical computing language – Python, R, SAS, Stata, Julia.
	The method used must be selected and certified in advance of program implementation. Requiring up-front identification of the estimation procedure ensures that there is no post-hoc model selection that would produce more favorable results.

Vermont

The primary objective of the Hourly Impact of Energy Efficiency Evaluation Pilot was to better understand the time-value of energy efficiency measure savings and the implications for program design, delivery, and evaluation. Because energy efficiency in the Northeast qualifies for capacity value, accurate estimates of the contribution of energy efficiency to peak hours is critical. Using high-frequency 15-minute consumption data from Green Mountain Power’s AMI and program tracking data from Efficiency Vermont, the study team modeled energy consumption of participating homes and businesses separately in the pre-installation and the post-installation periods. These two periods were compared to understand how consumption changed following installation of an energy efficiency or beneficial electrification measure. A secondary objective of the study was to compare Advanced M&V methods, or regression-based modeling of utility meter data, with the approaches traditionally used in Vermont. This comparison helped to determine where Advanced M&V could offer cost savings, improve the accuracy and granularity of savings estimates, and identify lessons for program operations.

What did we do?

To generate savings for the 21 prescriptive measures and the 124 custom projects in Vermont, we implement Advanced M&V procedures that build upon the International Performance Measurement and Verification Protocol (IPMVP) Option C Whole Facility approach to energy savings estimation. We do this through a regression model that follows Lawrence Berkeley National Laboratory’s (LBNL) Time-of-Week Temperature (TOWT) Model, where the dependent variable is hourly electric consumption from the meter and the independent variables contain information about the weather, day of week, and time of day.

This methodology estimates efficiency impacts in each hour of the year. Granular results provide insight into the distribution of energy savings across a year. For example, Figure 3 shows a heat map of the average energy savings from installing a variable speed heat pump. This measure’s model estimates a large load increase during the winter months (blue regions). Negative savings is a good thing in this case because it means Vermont homes are using the heat pump for heating and displacing delivered fuel consumption. There is also a pocket of denser load increase in the summer months during the middle of the day (orange regions), presumably due to homes that may not have had air conditioning previously using the heat pump as an air conditioner.

Figure 3: Variable Speed Heat Pump Heat Map

What did we find?

Modelling success for prescriptive measures is a function of effect size and number of participants.
Challenges are present when using Advanced M&V for “market opportunity” measures, where the baseline is a hypothetical new piece of equipment with code-minimum efficiency. This assumption creates issues because the pre-installation meter data reflects the replaced equipment at the end of its useful life.
For custom projects, Advanced M&V methods work best for sites with predictable load patterns and large savings as a percent of total consumption (Figure 4).

Figure 4: Example of a Well-Behaved Custom Project

With the level of noise present, we caution against using site-specific results to determine incentive levels in Vermont and suggest Advanced M&V is more useful as a program evaluation tool.
Advanced M&V is a powerful tool, but it is not the right tool for every job.

Given these findings, to have a chance at accurately and precisely estimating savings from efficiency measures, the guidance below must be taken into consideration:

	Data preparation is key. Imperfect linkages can lead to sites that do not have meters appropriately aggregated and, therefore, do not display accurate consumption records. Or it can lead to inadequate matching, since we are unable to guarantee that a matched control did not receive treatment.
	Filters identify those eligible for analysis. We apply filters, most notably uncertainty filters, so that the sites we analyze are limited to those that qualify as good candidates for Advanced M&V methods. Without these filters, the results would likely be inaccurately quantified.
	Precision needs to be a consideration. We use an accuracy assessment to assess bias across our various methods. This is a useful step as error in a model can never be less than the bias. For example, a model that over predicts by 2% cannot have a margin of error of less than 2% in the real world.
	Matching helps a lot. In the face of a disruption like COVID-19, matching is the only chance we have of producing unbiased estimates that separate the effect of efficiency from exogenous changes in energy consumption related to the pandemic

https://pda.energydataweb.com/api/view/2587/PGE_NMEC_Accuracy_Assessment_Report_02-15-2022.pdf ↑
https://publicservice.vermont.gov/sites/dps/files/documents/VT%20PSD%20Hourly%20Impact%20of%20Efficiency%202021.pdf ↑