## Help with this problem, win valuable prizes

Corrected equation

This post is by Phil.

In the comments to an earlier post, I mentioned a problem I am struggling with right now. Several people mentioned having (and solving!) similar problems in the past, so this seems like a great way for me and a bunch of other blog readers to learn something. I will describe the problem, one or more of you will tell me how to solve it, and you will win…wait for it….my thanks, and the approval and admiration of your fellow blog readers, and a big thank-you in any publication that includes results from fitting the model.  You can’t ask fairer than that!

Here’s the problem.  The goal is to estimate six parameters that characterize the leakiness (or air-tightness) of a house with an attached garage.  We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house  if there are toxic chemicals (gasoline, car exhaust, etc.) in the garage, but I won’t go into the motivation of the experiments, I’ll just describe them. (See below the fold for the rest)

A researcher puts a “blower door” — basically just a big fan — in the front door of the house. The fan is ramped up in speed, or is sped up in stages, so as to gradually pressurize the house relative to the outdoors.  The flow rate through the fan is measured, so you know how much air is going into the house.  That amount has to leak out of the house, either to the outdoors or into the garage.  The researcher measures the pressure difference between the house and the outdoors, and between the garage and the outdoors; of course, from these s/he can determine the pressure difference between the house and the garage.

All of the air that flows in (Q_{ho}, and don’t try to tell me it should be Q_{oh} because I know) has to flow out.  The first equation in the graphic is the conservation equation if you consider drawing a boundary around just the house: the air that flows into the house is the amount that flows directly to the outdoors plus the amount that flows into the garage.  P_ho and P_hg are the pressure differences between the house and the outdoors and between the house and the garage, which are measured. n_ho and n_hg are “flow exponents,” and C_ho and C_hg are “flow parameters”; these are to be estimated.  The flow exponents are expected to be between about 0.5 and 0.7. If a pressure is negative, then P^n is to be interpreted as sign(P)*abs(P)^n.

The second equation shows what happens if you draw look at the flows through the entire house-garage boundary. Again, everything that flows in has to flow out, either from the house to the outdoors or the garage to the outdoors. The flow from the garage to outdoors introduces two additional parameters.

So, finally, the problem.  I have a bunch of measurements of pressures (P_ij) and flows through the blower door (Q_ho).  If the model and data were perfect, there would be a unique set of C_ij and n_ij values such that Qho could be predicted from P_ho, P_hg, and P_go with no error.  But of course the data are not perfect. One of the main problems is that the pressure measurements can be systematically wrong: for example, if the outdoor pressure measurement is made on the lee side of the building, the house-outdoor pressure difference will tend to be overestimated.  Doing something simple like minimizing the RMS difference between predictions and measurements turns out to lead to systematic bias (e.g. n_ho is overestimated), in a sort of nonlinear analog to regression dilution. The obvious solution is to fit a statistical model that incorporates the error.  I think I could code this up myself, but I decided that this is a great opportunity to learn JAGS, which is similar to BUGS (which I have used before).  But I ran into trouble, perhaps from not fully understanding the JAGS or BUGS coding rules.

Here’s what I want:

1. The actual pressure difference between the house and outdoors P_{ho} is normally distributed about the measured pressure Pmeas_{ho}, with uncorrelated errors with standard deviation of 2 Pascals. Actually the mean error will not be zero as this implies, but for now let’s start this way. (In practice the blower door operator sets a desired pressure, and the blower door adjusts its flow automatically until the measured pressure matches the desired pressure).
2. The measured pressure difference between the garage and outdoors Pmeas_{go} is normally distributed about the actual pressure P_{go}, uncorrelated errors with s.d. of 2 Pa.
3. The error in the house-garage pressure difference is the difference between the P_{ho} error and the P_{go} error.
4. The measured flow Qmeas_{ho} is normally distributed about the actual Q_{ho} with error 20 cubic feet per minute.
5. The actual value of Q_{ho} that is predicted from the right side of either of the equations in the graphic has normal error with standard deviation 20 cubic feet per minute.

I run into some problems when trying to implement this in JAGS. I am tempted to post my JAGS model here — it’s only 20 lines — but I’m afraid of contaminating the mind of you, the reader, with bad ideas. (The code does not run).  I’ll post it in the comments tomorrow if people want it.

But here’s what I’d love: could one of you JAGS or BUGS experts help me out here?  There are a few issues that I don’t know how to handle, such as the fact that the error in Pmeas_hg is equal to the difference between the error in Pmeas_ho and Pmeas_go, and the fact that the model seems to need two equations that have Qmeas_{ho} on the left side but I think that that is not allowed.

Inputs are sets of Pmeas_{ho}[i], Pmeas_{go}[i], Qmeas_{ho}[i]; desired outputs are estimates (with uncertainty) of all three of the C_{ij} values and all three of the n_{ij} values, and of course, while we’re at it, estimates of the actual values P_{ho}[i], P_{go}[i], P_{hg}[i].

C’mon, twenty lines of JAGS or BUGS code, how hard can it be?

1. Phil says:

Oops, for cryin’ out loud, made a mistake on the very first equation in the figure: all of the subscripts in the terms on the far right should be hg:
C_{hg} P_{hg} ^ n_{hg}

2. My local energy district offers this test for free… am I to surmise from this post that the flow hasn’t been well characterized before? So all the tax incentives for retrofitting insulation is based on anecdotal observations??

3. Michael Blasnik says:

Phil –

It is quite a surprise to see something on zonal pressure diagnostics in this blog– I actually developed these testing procedures 20 years ago . But you seem to be working from only half of the test procedure. If your goal is to measure the connection between the house and garage then you need to implement a second test — specifically you need to open the garage door to the outside and re run the test. Now you can solve the system with much more leverage. In addition, you could also open the garage door to the house and close the garage door to the outside and use that test too. Also, the test procedure should involve measuring baseline pressures before and after the test and adjusting for these. You can further reduce measurement error by using averaging built into the gauges and also by measuring the pressure differences to outside using more than one hose, but that may be overkill on days when it isn’t windy.

I have used least squares adjustment as I mentioned in my earlier post about this. In reality, you can often get good results simply by assuming that the flow exponents equal 0.65 and testing at just one pressure difference (50 pa) and then solving the two equations for two unknowns.

4. Just show your code, it’s probably something dumb.

5. Manuel Moe G says:

In the real world, I wouldn’t trust this setup:

changes in humidity would change the ability of air flow through walls and gaps

the temperature differential between the floor and the ceiling would allow different flows to manifest at different times of day, different seasons

what is the prevailing wind? mother nature may be setting up a “blower door” herself across an entire wall

for consideration:

Fumes, from garage to house: I would use two ammonia sensors (1) nose (2) digital, and set a pan of ammonia in the garage right next to the door to the house. Sniff near the door.

Heated air, from house to garage: Use a space heater pointing at a sheet of plywood painted black, see how well the room next to the garage holds heated air, at the same time seeing how the air next to the door in the garage heats up. Repeat, with fan blowing heated air directly at door. The difference between is escape of heated air instead of radiation, more or less.

Might be just as accurate as your setup, but with a lot less work.

6. Phil says:

Michael, you’re right, one of the possibilities to explore more of parameter space is to open the garage door. That is actually one of about a dozen different configurations that the team (listed below) has tested both experimentally and with numerical simulation. For instance, you can put the blower door in the main door of the house, close the door between the house and garage, and close the garage door. Or you can open the garage door. Or you can leave the garage door closed and open the door between the house and the garage. Or you can put the blower door in the doorway between the house and the garage. You can use two blower doors. Etc. The point of this project is to figure out which procedure or set of procedures is most robust to errors in the pressure measurements, and which is/are most sensitive to the garage-house leakage parameters (which are the ones of interest). The project is run by Max Sherman, Darryl Dickerhoff, and David Faulkner, with a new grad student (Erin Hult) doing most of the analysis at this point. I’m not sure if you know those names but they are among the world’s experts on characterizing houses using these methods and they helped develop, test, and codify the standard procedures.

Manuel, and m@random musings, blower door testing works fine for grossly characterizing how leaky a house (or garage) is. Quantifying the leakiness of the interface between the house and the garage is a lot harder: it is a small fraction of the surface area of the house, and although it usually has a door it often has no other leaky features such as windows or plumbing penetrations. It is unclear whether you can get a good estimate of the leakiness of that interface using blower door tests; that’s part of what this project is about.

Manuel, what you are talking about would measure the leakage across the interface, but not the leakiness of the interface. Your suggestion is reasonable and you are very far from the first person to think of it. There has been a huge amount of work on measuring leakage and predicting it from measurable parameters and the upshot is that if you want to characterize the interface rather than just measuring the leakage at a given moment, you can’t do it with the approach you suggest.

Patrick, it is likely that there is at least one dumb mistake in my code, but there is also at least one fundamental thing I’m not getting. I’ll post my code tomorrow afternoon but I guarantee it is not a small fix like forgetting to specify a prior for a variance parameter.

7. Phil says:

Oops, jeez, I said Erin Hult is a grad student, she’s not, she’s a highly capable postdoc, which I knew perfectly well. Sorry, Erin!

8. Michael Blasnik says:

Phil-

Yes I know the LBL people — it’s a little surprising that they didn’t provide you much background on any of the work that has already been done on this issue of interzonal leakage measurement.

In the late 80’s I invented the “add-a-hole” method for quantifying leakage to zones without doorways to open and also “invented” the “open a door” method (but later found out that something very similar had been previously done by someone in Europe). I wrote an introductory article about this back in 1992 in Home Energy Magazine http://www.homeenergy.org/show/article/nav/blowerdoor/page/2/id/906 and spent much of the next couple of years training contractors around the country in these methods.

Years later I was also part of a research project that did a fairly detailed investigation on how best to use these techniques in weatherization programs and worked on much of the error analysis. The report — An Investigation into Zone Pressure Diagnostic Protocols for Low Income Weatherization Crews, Energy Center of Wisconsin, 2001 — can be downloaded at http://www.ecw.org/prod/208-1.pdf

I have also done a fair amount of unpublished work using the least squares adjustment procedure to solve more complex systems and also employ more tests so that the system is over-determined.

The bottom line result that you will likely find is that the leakage of the connection between house and garage can be quantified quite reliably in the vast majority of cases using the door opening tests I described — especially if good field data collection protocols are used. There are already many contractors out there doing this work and have been for years.

9. Certainly not a JAGS or BUGS expert, but for the question of “the fact that the model seems to need two equations that have Qmeas_{ho} on the left side but I think that that is not allowed.” couldn’t you eliminate Qmeas_{ho} from one of the equations by saying that the right hand side of equation 1 equals the right hand side of equation 2?

Also it’s quite confusing to try to understand the model with the errors in the equations. Can you post the proper equations here using the LaTeX facilities provided by this blog?

Testing Latex: $e^x - f_{ho}$ is it this plugin: http://wordpress.org/extend/plugins/wp-latex/faq/

or this one: $$e^x – f_{ho}$$ http://wordpress.org/extend/plugins/latex/

whichever shows up right..

• Phil says:

Daniel, hey, that’s great, I didn’t realize wordpress speaks LaTeX! Anyway, I’ve fixed the equations in the original (by replacing the graphic). Now there’s no schematic of the problem anymore, but I think it’s simple enough that people can picture it.

As for your suggestion about coping with the two Qho equations, I can think of several ways to do it, including yours, but can’t get any of them to run in JAGS. I’ve generated several error messages using different approaches; usually I get “Unable to resolve (one of the parameters). This may be due to an undefined ancestor node or a directed cycle in the graph.”

10. Phil says:

Michael, you say “it’s a little surprising that they didn’t provide you much background on any of the work that has already been done on this issue of interzonal leakage measurement.” I’m not sure what makes you think they didn’t provide much background. They provided quite a bit of background. They gave me papers to read, and I’ve had private discussions with them, and I’ve seen presentations. I also went out in the field to make measurements, to see what happens in the real world.

You say “The bottom line result that you will likely find is that the leakage of the connection between house and garage can be quantified quite reliably in the vast majority of cases using the door opening tests I described — especially if good field data collection protocols are used.”

This turns out not to be true. That’s why I’m here with this blog entry!

You also say “There are already many contractors out there doing this work and have been for years.”

This is true, there are plenty of people who have been doing this for years…and their estimates are often wrong by 60% or more. We’re looking for better accuracy.

Michael, Manuel, m@random musings, and whatever other people whose names start with M might choose to comment: I am trying not to take it personally that you all assume that I don’t know what I’m talking about, but my genial facade is gradually being chipped away. I have a very anodyne question about fitting a statistical model. If you don’t want to help me with it, that’s fine! Perfectly understandable! I wouldn’t spend my time helping you either, if the situations are reversed! But we are really at cross purposes if all you are going to do is tell me that I shouldn’t fit the model in the first place (unless you want to suggest a better one! Always happy to have a suggestion about how to improve the model!).

• Another question Phil, it seems like in your notation i indexes say different buildings, and j indexes something else but I can’t figure out what it is, perhaps repeated measurements? Not sure. Can you clarify?

• Phil says:

Ah, no, sorry for the confusion. When I said I have a bunch of measurements P_{ij}, i and j were just stand-ins for each of the possible pressure measurements: I have measurements of P_{ho}, P_{hg}, and P_{go}.

11. Michael Blasnik says:

Phil-

I did try to help — I actually pointed you to a research project on this issue and mentioned the least squares adjustment approach. I offered suggestions and assumed you hadn’t yet read a lot of background on this because your posting omitted some key elements to the problem. You don’t show baseline pressure measurements and, more importantly, you seemed to show the testing procedure as simply varying the blower door flow to induce varying pressures across the system and then solving. That approach is not very sensible since it depends on differences between the flow exponents (which are likely quite small) for identification. It is testing with a different configuration (e.g., opening a door or using a second fan) that makes it solvable. I’m sorry if you are insulted by this, but I based my comments on what you wrote.

In terms of your claim that people can’t get accurate results using the procedures I described, I’d be interested in a source — I have a fair amount of experience that disagrees with that claim. An error of 60% may occur when the leakage is tiny but then the absolute error is still small. The key to doing better is better measurement procedures in the field and not just a better statistical model. That said, I’d certainly be interested in any innovative statistical approaches to this problem that you may come up with and I’d be curious to see how they perform compared to least squares adjustment.

• Phil says:

Michael,
Perhaps I’m too thin-skinned, or whatever the right word is. You’re right that I didn’t give a lot of background to the problem. I didn’t think most blog readers would be interested, and the background isn’t really relevant to the specific statistics problem I am trying to solve. I’m really just trying to get help fitting this specific model (or a better one), and giving a page or even a few paragraphs of explanation as to why this problem interests people seemed a bit off-topic.

Erin (the aforementioned post-doc) is working on a report to document the large relative errors. You are right that the biggest problems occur when the house-garage leakiness is small, but of course that is the normal case: the leakage across that interface is normally a very small percentage of the leakage across the whole rest of the house’s envelope. Also, the effects of variations in wind, and systematic bias due to placement of the pressure taps relative to the wind, have been underestimated we believe; the latter can be handled, perhaps, by “better measurement procedures in the field,” and I guess the former can as well if you take enough measurements over a long enough time and only in favorable (low) winds…but our experimental team (principally Dickerhoff and Faulkner) is more careful and follows better procedures than most practitioners, and yet these issues do come up.

• Michael Blasnik says:

You really can’t make precise pressure measurements when it’s windy out — especially gusty. Test protocols often include an assessment of wind and stop testing under extreme circumstances.

I wasn’t suggesting that you needed to give a lot of background to the problem — but that you needed to describe what the tests actually were and provide the right equations. If you don’t include an equation for a case with the door open, you end up with lots of problems with collinearity.

Now I’ll try to be even more helpful and suggest a slightly different way to cast the equations

The first equation stays the same
Qho = Cho * Pho ^ nho + Chg * Phg ^ nhg

But the second equation should be the flow balance for the garage during this same test
Chg * Phg ^ nhg = Cgo * Pgo ^ ngo

Now you can add a third equation, which is the pressure summation constraint
Phg + Pgo = Pho

You then open the garage door to the outside (or inside) and now you have the first equation again but with the new flow and pressures

Qho2 = Cho * Pho2 ^ nho + Chg & Phg2 ^ nhg

but now you don’t have the garage flow balance and the pressure summation becomes just
Pho2 = Phg2

You could just use Pho2 in place of Phg2 above but I think adding this equation may be better

A similar two equations cna be added if the test is repeated with the doors reversed. This test will be especially important if Phg was very near Pho in the first test.

In all of these equations, wherever I write P, I really mean (P – Pbaseline). You can use maybe 10 second pressure averaging at the gauge and calculate standard deviations from a set of 10 second intervals to get each measurement and also get an estimate of random noise (but not bias). If you re-measure baselines after each test it can provide some sort of indicator of the part of bias due to systematic wind or dT changes.

You can run each of these tests at a range of pressure differences to try to estimate the values of the flow exponents nhg and ngo and nho but I think it will be hard in practice to get good estimates for these given the range of Pho typically used. In practice, the values of nho from whole house tests are almost always in the range 0.60 – 0.70 and outlying values tend to be noise more than signal. I analyzed a set of automated multi-point blower door tests from 4000 homes and found that a third to a half of the observed variance in flow exponents can be explained by the estimation uncertainty of the individual values. So using a default value of 0.65 isn’t such a bad idea — at least for nho and nhg but maybe not as much for ngo due to different types of leaks. sorry to go off on such a tangent but I hope it may be helpful

Good luck

• Erin says:

Michael- Thanks for your reply. In the analysis I’ve done so far, I did use the set of equations you list as well as testing all of the other possible configurations (varying location of blower door, opening/closing various openings). Phil is aware of this aspect of the testing procedure–I think he was just trying to simplify the problem description. I also found that assuming a constant nhg reduced the uncertainty in the results for synthesized data. It also seems to remove some of the apparent bias in CHG & nhg when all 6 parameters are fit. For the field data that we have, even under the most calm conditions, the leakage flow varies by a factor of ~2. I have been using least squares optimization to determine the remaining parameters and have tried other methods including single pressure tests, but I will look into your adjusted least squares approach.

12. Manuel Moe G says:

Sorry, Phil. What happened here was a combination of male-on-male “mansplaining” and “Dear Lazyweb” but without jwz’s snarky reputation preceding.

I blame you for drawing a semi-recognizable house + garage at the top of your post. ;-) If the top of the post would have been your code, the commenters would have silently withdrew with their tails between their legs.

Your approach is very interesting – I was completely taken over with the physicality of the problem, and would have never considered entering it into a general purpose Bayesian analysis tool to deal with both finding the solution and discounting measurement bias and bias due to failings of the model. The settings on the blower + 3 pressure readings to characterize “quantifying the leakiness of the interface between the house and the garage” is very interesting – the complex physicality of the wall & door, floor-to-ceiling boggles my mind, probably unnecessarily.

Please blog about the results. Daniel Lakeland’s excellent blog, highlighted here recently, might be a better audience for this question.

• Phil says:

Manuel, thank you for your generous reply, and I apologize to you (as to Blasnik) for being irritated when people were only trying to help. I will post results here when I have some.

13. Phil says:

Here’s one way I’ve tried to fit this model in JAGS. This generates a runtime error: “Compilation error on Line 32: tau.qho[1] is a logical node and cannot be observed.”

There are N data points (each point consisting of pressure measurements PhoMeas and PgoMeas, and flow measurement QhoMeas).

model {
for (i in 1:N) {
# relationship between measured pressure and actual pressure
errPho[i] ~ dnorm(0,tau.pho)
errPgo[i] ~ dnorm(0,tau.pgo)
Pho[i] <- PhoMeas[i]+errPho[i]
Pgo[i] <- PgoMeas[i]+errPgo[i]
Phg[i] <- Pho[i] + errPho[i]-errPgo[i]

# relationship between predicted flow and actual flow
errModel[i] ~ dnorm(0,tau.model)
Qho[i] <- Cho*sqrt(Pho[i]^2)^nho + Cgo*sqrt(Pgo[i]^2)^ngo+errModel[i]
Qho2[i] <- Cho*sqrt(Pho[i]^2)^nho + Chg*sqrt(Phg[i]^2)^nhg+errModel[i]

# Relationship between measurements and true values
QhoMeas[i] ~ dnorm(Qho[i],tau.qho)
Qho2Meas[i] ~ dnorm(Qho2[i],tau.qho)
}
# prior distributions
Cho ~ dunif(0,200)
Cgo ~ dunif(0,200)
nho ~ dunif(0.4,0.8)
ngo ~ dunif(0.4,0.8)
nhg ~ dunif(0.4,0.8)
tau.model <- pow(sigma.model,-2)
sigma.model ~ dunif(0,1000)
tau.chg <- pow(sigma.chg,-2)
sigma.chg ~ dunif(0,1000)
tau.qho <- pow(sigma.qho,-2)
sigma.qho <- dunif(0,1000)
}

• Phil says:

Ah, forgot to mention, in this particular attempt I duplicated QhoMeas (to create identical vector Qho2Meas), and I have used different variable names for the two separate model equations. If there is another way to handle the two equations, that actually works, I’d be delighted to hear it!

• Michael Blasnik says:

see my comment above — you may want to cast the second equation as the flow balance for the garage — without any Qho term needed or Pho or nho. You can also add the flow summation term that says Pho = Phg + Pgo.

• On this line:

sigma.qho <- dunif(0,1000)

don't you want ~ not <-
?

• Phil says:

You’re right, I sure do! Unfortunately this does not change anything, at least as far as the error message is concerned. I think the compiler terminates as soon as it finds the earlier error.

Also, the line that defines Phg[i] should be
Phg[i] <- Pho[i] – Pgo[i] + errPho[i]-errPgo[i]

but of course that doesn't fix my problem either, same error.

• tau.pho and tau.pgo don’t seem to be defined anywhere, is that part of the problem?

The error message is cryptic. I can’t figure out what it’s really saying.

• My theory here is that without tau.pho and tau.pgo defined there’s no way to connect everything up and therefore it’s giving an error which is technically correct but spuriously confusing.

• Phil says:

tau.pho and tau.pgo are fixed values, specified in the initialization (where I also gave initial guesses for all of the other parameters).

• fred says:

What variables do you specify in the data? I’m not 100% certain, but suspect you are doing something like the following;

model{
z <- x+y
x~dnorm(mu1,1)
y~dnorm(mu2,1)
}
#data
list(z <- 10)
}

… see item 14 of http://www.mrc-bsu.cam.ac.uk/bugs/faqs/contents.shtml for why this can't be done, at least not directly.

• fred says:

More on this;

* Put the observed data on the LHS of “~” statements

* Don’t make 2 copies of Qho2meas. This doesn’t reflect how your model generates data. What is the mean value of the Qho(2)meas measurements? Currently you seem to define two choices, which can’t be right.

* If Phgmeas is just the difference between Phomeas and Pgomeas, it may be easier to just write the whole model in terms of Phomeas and Pgomeas. (With only 3 random quantities the model can’t get too complex, I hope)

* For a bunch of measurement error examples, see http://www.stat.tamu.edu/~carroll/eiv.SecondEdition/rwinbugs.php

• Phil says:

Fred, I know the two copies of QhoMeas can’t be right. I’m trying to find some JAGS-legal way to constrain all of (Chg, Cho, Cgo, nhg, nho, ngo) such that both equations for Qho are consistent. I know this won’t do it but I don’t know what will.

Also, although I know that generally I’d want something like PhoMeas ~ dnorm(Pho,tau.pho), in this case we are adjusting the blower door to set a given value of PhoMeas, so shouldn’t I use Pho ~ dnorm(PhoMeas,tau.pho)? Well, maybe not…I will look at the measurement error examples that you mention.

14. Hey Phil, there’s sort of two issues here, one is the computing issue of getting JAGS to run your model which I’m interested in because I don’t have a lot of experience with the JAGS/BUGS language and would like to have more. But the other question is what is the best type of model to specify? I just put up a post about how to re-specify the mathematical model into a form that is more conducive to analysis. Take a look at my blog entry here: http://models.street-artists.org/?p=1329

• modified this morning to fix a few things that I was foggy on at 1AM.

15. […] my post on Likelihoods for physical type models, Phil Price posted an example problem about modeling air leakage in buildings. His equations […]

16. Phil says:

Fred and Daniel, thanks, I will look at that stuff this morning (I’m on the West Coast) and see what I can do with it. Thank you very much.

• Phil, can’t nest replies very deep, so in reply to your question above of shouldn’t you have Predicted ~ Normal(Measured, error.precision) vs Measured ~ N(predicted, error.precision). Mathematically you have:

Measured = Predicted + error which is mathematically equivalent to Predicted = Measured – error. If your errors have symmetric distribution about zero, then -error and error have the same distribution. If the errors are not symmetric then they don’t. An interesting issue.

17. Phil says:

Fred, your comment from 9:12 on Feb 14 is very helpful. It prompted me to try a much simpler problem, so simple that I had been sure there was no need to bother typing it in because I knew it would work: a simple linear model with measurement error in both x and y. xx[i] is the true x value of the ith point (unknown), xxmeas is the measured x value, yymeas is the measured y value. Assume a linear relationship between xx[i] and yymeas[i] with slope a and intercept b. Simple, right? So I took a stab at it:

model {
for (i in 1:NN) {
yymeas[i] ~ dnorm(a*xx[i]+b,tau.y)
xxmeas[i] ~ dnorm(xx[i],tau.x)
}
a ~ dunif(0,10)
b ~ dunif(0,10)
sigma.y ~ dunif(0,100)
sigma.x ~ dunif(0,100)
tau.x <- pow(sigma.x,-2)
tau.y <- pow(sigma.y,-2)
}

Doesn't compile: "unable to resolve xx[1]"

Following Fred's helpful link to the BUGS documentation, I find a somewhat cryptic note about why this won't work, suggesting that if I know the marginal distribution of xx then I should be fine. Unfortunately, adding xx[i] ~ dnorm(mu,tau.xx), with suitable values for each, does not in fact fix the problem: the problem seems to compile OK but when I run it I get "Error in update.jags(m, 1000) : JAGS model must be recompiled". But recompiling doesn't help.

Obviously I need to understand this stuff a lot better than I do: if I can't even do a simple measurement error regression problem I am not going to be able to solve one with two coupled nonlinear equations! So, I will look at the measurement error examples that Fred also pointed me to.

I have to put this project aside for a while, so I probably won't work on it again until next week. When I do, I'll make a new post if I have anything worth saying.

• Yes, without getting JAGS to run it’s not possible to entertain different modeling approaches. But when you do figure out the JAGS issue (and I’m very interested to find out how it turns out), I think you’ll find the equation I came up with to be a better way to approach the problem. First off it eliminates the need to deal with two versions of the equation for Q_ho, and second off it should regularize your problem numerically and provide interpretability of differences in coefficients between different experiments.

I may play with the simple linear regression problem myself to better understand the JAGS stuff. Can you send me the files you’re using? You can email me at the email address provided in the Mail field to this comment.

• Phil says:

Daniel, I don’t think the single equation contains all of the information that is in the pair of equations. To illustrate:

The i’th experimental measurement gives me (Qho[i], Pho[i], Phg[i], Pgo[i])

Suppose I have three measurements, i=1,2, and 3, and let’s pretend there were no experimental errors at all and that the equation is a perfect model. With two equations per measurement, I have six equations and six unknowns (Cho, Cgo, Chg, nho, ngo, nhg). I can solve that. But combining the equations as you have done, I have three equations and six unknowns.

I think combining the equations as you have done, though perfectly legal mathematically and perhaps helpful conceptually, discards information. Consider a simpler problem:
5 = x + y
5 = x+ z

y=z, so I can add 0 = y-z to the top equation and get

5 = x + 2y – z

And that’s right. But now someone tells me that x=1. If I have the original system, this tells me both y and z, completely solving my problem. But in the latter equation, all I know is that 2y – z = 4.

I would be happy (though also embarrassed!) to be corrected, but I think your proposed equation will not suffice.

• The “second equation” besides 5 = x + 2y – z that you need is y = z which of course needs to be retained. The question then arises in a statistical context how much modeling error is there in the y=z equation (ie. in conservation of mass).

If you decide that conservation of mass is exact then you can do 5 = x + 2y – y and you can ignore z. In the context of your equation that would be one way to go, eliminate the garage to outside leakage coefficient (since anything getting into the garage must eventually go outside you only need to know how much goes into the garage).

On the other hand, perhaps there is modeling error in your equations, in which case you need something like y-z ~ N(0,something) and you should think of the modeling error in the single equation as inflated by the modeling error in assuming y-z = 0 exactly.

Does that make sense? remember I wrote my blog post at 1 AM while not able to sleep, I think it has a variety of good ideas in it, but I wouldn’t call it a research quality paper ;-)

• but notice that y=z has nothing to do with the left hand side (5 or in your original equations Qho) so we do eliminate the need to have say two copies of your flow data.

• Conservation of mass isn’t quite exact since air is slightly compressible and the rooms have some volume and the speed of sound is finite and soforth but it might be a good idea to start by assuming that it is exact, or close enough to exact for government purposes :-)

• Phil says:

The x,y,z equations were deliberately simplified to show that you still need two equations to retain all of the information. You can shift terms from one equation to the other — as you correctly note, you can either use the original pair of equations, or 5 = x + 2y – z and y = z — but you still need two equations. This was not intended to be a simplified version of the actual pair of equations, which has six unknowns. And it’s certainly no stand-in for the actual problem, for which the presence of error is very important and in fact is the whole point of what I am trying to do, which is to estimate the parameters in a way that is not biased by the errors.

Which is to say, I still don’t think your single equation does the trick. I think it may be useful and helpful in other ways, so thank you for writing it!

18. Michael Blasnik says:

Phil-

You are correct — combining the equations doesn’t help. You need the flow balance equations for the garage to get at the errors and fully exploit the information in the measured data.

I actually wanted to point out a different issue with your code — you use terms like sqrt(Pho[i]^2) to try to deal with negative pressures, but this approach gives you the absolute value but does not preserve the sign of the flow. You need to know the direction of flow for the equations to be correct.

• Phil says:

Michael, you’re right…I’m not sure why I made that mistake. What I want is sign(P) * abs(P)^n. Oddly, JAGS does not seem to have a sign() function, but it does have abs(). What I should have in my code is something like (P/abs(P))*abs(P)^n. Thanks for pointing that out.

Of course, it still doesn’t run, but at least if it did it would have a chance of being right.

19. You’re right that you can’t get rid of the second equation, but you can reduce the second equation to the one that I used and which doesn’t involve the $Q_{ho}$. In nondimensional form:

$\epsilon_{hg} (p_{hg}/p_0)^{n_{hg}} = \epsilon_{go} (p_{go}/p_0)^{n_{go}}$

now you can ask yourself, is this an exact equation for which there is no modeling error (conservation of mass is an exact principle as far as we know but this version of conservation of mass has some assumptions, namely that there is no accumulation and no “other places to go”) or is this equation an approximation which has its own error? That addresses the “modeling error” and then of course there’s also the measurement error.

In your problem certainly the $p_{gh}$ and $p_{go}$ as *measured* have errors.

However in your JAGS attempt you required that the errors in the hg and the go terms were exactly opposite so that the conservation of mass principle was exact. In this case, you can perhaps say that the “second equation” is simply:

$\epsilon_{hg} (p_{hg}/p_0)^{n_{hg}} + N(0,tau.hg) = \epsilon_{go} (p_{go}/p_0)^{n_{go}}$ and then there’s really just ONE random component that models the error between prediction from measured values and the exact value.

Hopefully this is all eventually helpful. When I wrote the original blog post I was thinking mainly of how to eliminate the double dependency on Q_ho, I should have been more explicit about the conservation of mass equation.

20. […] at Andrew Gelman's blog, Phil and I have been having a conversation about issues related to the building envelope problem I […]