This post is by Phil Price, not Andrew.
Before I get to my question, you need some background.
The amount of electricity that is provided by an electric utility at a given time is called the “electric load”, and the time series of electric load is called the “load shape.” Figure 1 (which is labeled Figure 2 and is taken from a report by Scottmadden Management Consultants) shows the load shape for all of California for one March day from each of the past six years (in this case, the day with the lowest peak electric load). Note that the y-axis does not start at zero.
In March in California, the peak demand is in the evening, when people are at home with their lights on, watching television and cooking dinner and so on.
An important feature of Figure 1 is that the electric load around midnight (far left and far right of the plot) is rather stable from year to year, and from day to day within a month, but the load in the middle of the day has been decreasing every year. The resulting figure is called the “duck curve”: see the duck’s tail at the left, body in the middle, and head/bill at the right?
The decrease in the middle of the day is due in part to photovoltaic (PV) generation, which has been increasing yearly and is expected to continue to increase in the future: when the sun is out, the PV panels on my house provide most of the electricity my house uses, so the load that has to be met by the utility is lower now than before we got PV.
Other technologies also change the load shape. For example, electric vehicle charging will cause increases in mid-morning as people charge their cars after driving to work, and in the evening as they charge them when they return home. “Time-of-use” electric rates, in which the charge for electricity depends not just on the amount that is used but also on when it was used, will get people and businesses to shift some of their loads to off-peak periods.
There are large weather-related changes to the load shape. On a hot summer day the peak is usually some time in the mid-afternoon, when home and commercial air conditioners are working hard. Figure 2 (which is from a UC Boulder class website and I presume shows Colorado, not California) shows how the load shape can change with season.
Electric utilities need to set electric rates now, and also to plan for the future. For instance, the annual peak load — the highest system-wide load reached in the entire year — is a very important parameter because both the generation and transmission systems must be able to meet the demand. Installing giant transformers and transmission lines everywhere would be wasteful and expensive, but if they are under-sized compared to future needs then they will need to be supplemented or replaced, which is also wasteful and expensive.
Naturally, utilities are very interested in how the load shape is changing, and in forecasting how (and why) it will change in the future. As part of that forecasting, they are interested in quantifying how things are changing already: when do people charge their electric vehicles? How much do rooftop solar installations contribute throughout the day and the year (this of course depends on the orientation of the roofs, and how and when they are shaded, and so on). Even if they knew exactly what is happening now, this wouldn’t be sufficient to make an accurate forecast: electric vehicles will continue to increase in battery capacity, the roofs on which solar panels are installed will differ from those that have already installed them, and so on. But one has to start somewhere.
When the technology of interest is a generating technology it is possible in principle to directly monitor it: one can install a meter that measures the load of the electric vehicle charger on its own, for instance. But installing a meter is expensive; and even when meters exist the data may not be available to the utility but only to the system installer and the customer (this is a common case with solar panels). And when trying to evaluate the effect of something that doesn’t generate power, such as time-of-use metering, there is no way to directly meter the change, even in principle. In these cases, there are two basic approaches: (1) a before-and-after comparison (this is called pre/post analysis) and (2) a case-control study.
In a pre/post analysis, one creates a statistical model that fits the load data from before the new system or policy was installed, and uses it to predict the load afterwards. This is compared to the actual load afterwards, and the difference is quantified. For instance, one could create a statistical model to predict the peak load, given the afternoon outdoor air temperature and the day of the week, and fit the model to data from before the building went onto a time-of-use metering plan. This model could be used to predict what the peaks would have been in subsequent weeks if the building had remained on its original rate plan. These predictions can be compared to the actual peaks.
Pre/post analysis usually works pretty well if the “post” period is fairly short — a few months, say. After that it gets increasingly uncertain because buildings tend to change: the tenant on the third floor moves out, the retail shop on the ground floor changes its operating hours, etc. So if time periods of interest are a long way apart — comparing this summer to last summer — a pre/post analysis is subject to a lot of uncertainty. Still, this is the standard approach to quantifying changes.
Fortunately, utilities, including my client, usually don’t care all that much about any individual building. They are interested in large portfolios: “What was the effect of switching 5,000 small commercial customers in a specific region onto a particular time-of-use rate plan last year?” Aggregation obviously helps enormously: tenants move out of some buildings but into others, some businesses increase their operating hours but others decrease them, etc. But even with aggregation, pre/post analysis has some problems. For one thing, a change in the local economy can raise or lower all the boats, compared to the model prediction: fewer vacancies, more workers in office buildings, longer operating hours. For another, changes in technology (such as gradually switching from fluorescent to LED lighting, or from desktop to laptop computers) can lead to a modest but non-negligible change in load in most buildings, on a timescale of a few years.
An alternative to pre/post analysis is case/control analysis: Match those 5,000 small commercial customers who switched rate plans with 5,000 (or even 50,000) who did not. If you do the selection so that the average load shapes of the cases was the same as the controls prior to going onto time-of-use metering (or prior to getting electric vehicle chargers, or prior to getting solar panels, or whatever), then you can compare the cases to the controls later on, and the difference should give you the effect of the changes.
Which (finally!) brings me to my question: how should I choose the control group?
Suppose we are interested in how the load shape of a group of mid-size commercial buildings in a specific county was affected when those buildings add solar panels. One approach to choosing the control group would be to use “propensity score matching” or some similarly motivated approach. One of the problems is that I have very little information about the buildings, all I have is what is known to the utility: the rate plan they are on (there are a lot of different rate plans, especially for commercial buildings), their energy usage, possibly something about the physical size of the building. I could choose a control group of buildings that has the same mix of rate plans and approximate total energy usage and approximate physical size…but surely the companies that have bought solar panels will differ in important ways from those that don’t.
But also, this approach would discard an enormously useful type of information that I _do_ have, which is the load shape of the buildings. Surely I should try to choose control buildings that have load shapes (prior to installing solar panels) similar to the case buildings. In an ideal case, suppose there are two office buildings that have identical load shapes. In summer 2014 one of them adds solar panels. Surely the other one should go into the control group. So one tempting approach is to do something akin to medical case-matching. Perhaps, for each case building, I could choose N control buildings that have highly correlated loads (in the year before the solar panels were installed), and then create a “virtual” control building by taking a linear combination of the loads in those buildings. Something like that. So, ok, “something like that.” But what, exactly?
Can anyone point me to relevant publications, or tell me about past experience, or give me useful advice, about creating a control group for this kind of application?
Thanks in advance!
This post is by Phil Price, not Andrew.