## Unhappy with improvement by a factor of 10^29

I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 iterations, and I just took a look at where the particles are now. They are indeed converging — that is, they’re coming to some agreement on what the best region of parameter space is. But the standard deviation for each parameter is still about 0.4 times what it was at the start. (For instance, we put in a very wide prior distribution for the thermal mass of the furnishings in the building’s offices, and after running the optimization the distribution is about 0.4 times as wide as it was at the start).

I was, and still am, a bit disappointed by this, but: we have 74 parameters. Our particles were spread through a huge volume of parameter space, and now they’re spread through a space that is about 0.4 times as big for each parameter. That means they’ve agreed on a volume of parameter space that is about 0.4^74 times smaller than it was before, or about a factor of 10^29 smaller. Maybe it’s not so bad.

1. Igor Carron says:

Phil,

Maybe you want to take a look at the Experimental Probabilistic Hypersurface that tries to deal with this type of issue:

for a small implementation:

Here is a small description with Aleks in the comment section of:
http://www.stat.columbia.edu/~cook/movabletype/ar

Of special interest, is its use in the evaluation of the Cathare code, a thermal hydraulics code used in nuclear power plants (Relap 5 is the equivalent code in the U.S.)

Cheers,

Igor.

2. Tom Fid says:

Is there a reason to think that the particle swarm will outperform some kind of nonderivative hillclimbing method? 120×140 simulations isn't much in 74D space, but it's a fair amount for a coarse search.

Seems like the appropriate metric is as much the improvement in the payoff as it is in reduction of the SD of the swarm. You wouldn't care if 119 of the particles were bonkers as long as one found a good solution.

3. Phil says:

Igor, Thanks for the suggestion. I've glanced at those pages and will take a closer look.

Tom,

Wetter and Wright (Building and Environment 39, 2004) looked at optimization for these models and found that discontinuities in outputs as a function of inputs degraded the solution of hill-climbing (or -descending) methods. Discontinuities arise for several reasons, including the fact that buildings (and simulated buildings) include if-then-else logic. For instance, if the indoor temperature exceeds the thermostat cooling setpoint then the air conditioner kicks on, otherwise it doesn't, so a change of epsilon in the setpoint can lead to a big change in the predicted cooling energy use. Anyway, the answer to your first question is Yes. Of course that doesn't mean Particle Swarm is the best approach, but it's very easy to implement in parallel and is known to work reasonably well for our problem.

You're right that a better metric is the improvement in the penalty function, which in fact is pretty good. And the particles will eventually converge on the best point they can find (the problem is still running, I was looking at an intermediate solution), whether that's actually a good solution or not. I'm not really complaining about our solution so far, nor suggesting that looking at the size of the region of parameter space is the right approach to judging success. Rather, I'm trying to point out the simple fact — simple, but I hadn't thought about it before — that even a modest reduction in uncertainty about each parameter can lead to an almost unimaginably huge reduction in uncertainty about the multi-dimensional distribution.

4. Ted Dunning says:

You may also be seeing strong correlations in the values of your parameters. It is likely that there is some kind of trade-off between different parameters. This could mean that you actually have a knife thin region remaining, but when projected onto all of your parameters, it looks pretty under-determined.

Your 10^29 improvement in volume assumes radial symmetry or nearly so. Try computing effective volume from your data points in some fashion that isn't as constrained to the parametric axes. You may be pleasantly surprised.

5. Phil says:

Ted,
Nah, I've looked at all of the pairwise plots of the final locations (using the R "pairs()" function) and they mostly look like shotgun blasts. But the convergence isn't finished yet, it's quite possible — in fact, I think likely — that once they settle down more, we'll end up with some correlations.

6. Markk says:

What reason do you have to assume convergence in the phase space? Even to a particular region as you say. Could there be long term chaotic paths in these trajectories? Maybe you don't care as long as critical region is small enough.

7. Gabe says:

Uh, how about trying fewer variables?

Maybe look for some multicollinearity between them, and paring down a few?

8. Phil says:

Markk, you're right, we don't know that there is only one good solution (or one region of parameter space that has all of the good solutions). The parameter space is obviously far too large to explore completely. We're trying a few methods to look at the stability of our results (e.g. by changing the goodness-of-fit metric slightly and seeing if we end up with about the same parameters). But ultimately what we care about is whether the parameter values that we end up with are a good match to the real world, and that, thankfully, we can answer for a lot of these parameters because in this specific test case (but not many real-world cases to which we will apply these techniques) we know the right answers.

Gabe, we actually need to add more variables, not fewer! Reality is what it is, and in the real world there really are more than 70 parameters that have a substantial effect on the phenomena we're studying.