Use of Jeffreys prior in estimating climate sensitivity

William Morris writes:

A discussion of the use of Bayesian estimation in calculating climate sensitivity (to doubled CO2) occurred recently in the comments at the And Then There’s Physics (ATTP) blog.

One protagonist, ‘niclewis’, a well known climate sensitivity researcher, uses the Jeffreys prior in his estimations. His estimations are always at the low end of the range and the suggestion is that using Jeffreys prior may be part of the reason. The prior has a peak at zero – a physically implausible value for climate sensitivity. There is much too and fro in the discussion . . . which you will doubtless not want to read in full.

I know from reading your blog that you are an authority on Bayesian estimation. Would you have the patience to give your opinion on the blog (or just to me if you prefer) – is the Jeffreys prior a good choice for estimating climate sensitivity?

My reply:

Despite what the Wikipedia entry says, there’s no objective prior or subjective prior, nor is there any reason to think the Jeffreys prior is a good idea in any particular example. A prior distribution, like a data distribution, is a model of the world. It encodes information and must be taken as such. Inferences can be sensitive to the prior distribution, just as they can be sensitive to the data model. That’s just life (and science): we’re always trying to learn what we can from our data.

I’m no expert on climate sensitivity so I can’t really comment on the details except to say that I think all aspects of the model need to be evaluated on their own terms. And there is no reason to privilege “the likelihood,” which itself will be based on modeling assumptions.

P.S. See here for some general discussion of objectivity and subjectivity.

49 thoughts on “Use of Jeffreys prior in estimating climate sensitivity

  1. Thanks for this. Very useful to get your thoughts. To be fair, Nic Lewis’s work is actually quite interesting and produces reasonable results, if a bit lower than other methods. On the other hand, there is a tendency to try and suggest that by using an “Objective prior” that the result is somehow more robust than other methods, which require some level of subjectivity. As you say, the goal is to learn from our data. The goal isn’t to simply blindly apply a method and then assume that because the method has been applied properly, that the result is then a good representation of the system we’re trying to understand.

  2. From the paper: “In this model, it is assumed that the total radiative feedback can be described by a constant feedback coefficient λ multiplied by the globally averaged surface temperature anomaly.”
    https://bskiesresearch.files.wordpress.com/2014/06/annan_current-climate-change-reports_2015.pdf

    This is a good example of the prior as just another part of model specification. Such a simple model is dubious to begin with, no one really believes it is “true” (although it may still be an useful approximation). If the choice of prior is weakly constrained, that reflects uncertainty in how this phenomenon should be correctly modeled. Still, the parameter is constrained and this should be reflected in the model. No one is suggesting sensitivity of +/- 100 C as a plausible value. So I say tinker away with more or less complicated models and see how well they fit future data.

  3. I love every discussion that provides an opportunity to emphasize how little of an agreement there exists, even among the experts of a given domain, as to what constitutes a good (nay, even a barely acceptable) prior for a given problem.

    • Nah, I don’t see that here. What I see here is total lack of understanding about what Bayesian models do… “Objective” vs “Subjective” priors is like saying “citrusy vs peppery nebulae”

      What I do see in this article is

      1) Everyone agrees that the “uniform” prior gives unrealistic weight to “high” values of sensitivity
      2) An attempt to satisfy some kind of “linguistic” criteria (I used an “Objective” prior) instead of trying to encode a state of information.

      If we really do KNOW that sensitivity can’t be 0 or less, and can’t be greater than say 6 or so, then we should try to encode that in a prior. Something like a gamma(1.1,1/2.0) or gamma(2,1/1.5) or a hierarchical model where the two gamma parameters are linked together…

      If you don’t know what your prior MEANS, then you wind up just pushing buttons and using linguistic weasel words to justify what you did.

      • If we really do KNOW that sensitivity can’t be 0 or less, and can’t be greater than say 6 or so, then we should try to encode that in a prior. Something like a gamma(1.1,1/2.0) or gamma(2,1/1.5) or a hierarchical model where the two gamma parameters are linked together…
        I think something like this has been done in this paper which – I think – was mainly a response to the earlier work using a uniform prior, but seems to have shown results using an expert prior which was small at 0C and above 6C.

        • Yes, it looks like page 12 there shows basically the story. The difference between the uniform, webster, and cauchy priors is very limited in the “core” (ie. the shape of the posterior around 1-3) but has a big effect on the tail (where the vertical line in the lower graphs falls, which is the upper 95%tile point). It doesn’t look like the lower end is much affected by the prior, as the likelihood drops off extremely rapidly around 1.

          How important this all is depends a lot on the next step, where you assign some kind of decision theoretic costs to various values. With the uniform prior you’re obviously going to be considering sensitivities that go higher, and with nonlinear costs you may be winding up with the decision being controlled not by what we really think (ie. the region near the core) but how willing we are to consider extreme cases (out in the tail).

        • So the real question is, what state of knowledge are we in? Do we truly before having run the model believe that the range 10 to 20 is actually plausible? If not, then it needs to have lower probability in the prior than the range 0 to 10. Also, if we believe that 0 to say 0.1 is also highly implausible, then we should make that have low probability. It seems to me that based on the general discussion, most people should agree at least sort of approximately with the “cauchy” in that paper, though maybe increasing the dispersion of that cauchy by 50% would be less controversial… but either way it’s going to give a smaller upper tail than the uniform.

        • I think the Greenhouse effect itself largely rules out equilibrium climate sensitivity values above 10C (probably even above something like 6C). The Greenhouse Effect is 33K at a CO2 of 280ppm, so 10C per doubling of CO2 seems implausibly high. Also, since the overall process is essentially CO2-driven warming plus feedback responses, something close to 0C is probably also ruled out on the basis that feedbacks can’t entirely cancel out the CO2-drive warming, given that they would then turn off. There are probably more sophisticated arguments, but that’s probably a reasonable way to think of this.

        • Ok, and that’s exactly the kind of reasoning we need to get a prior. So a prior like maybe even gaussian centered at 4 with SD of 4 truncated to [0,20] should be kind of “good enough” for the info you just gave.

        • Hopefully you can point me in the correct direction. In the paper they use deltaN = deltaF – lambda*deltaT, where

          N=total heat uptake
          F=net radiative forcing at the top of the atmosphere
          T= global mean temperature
          lambda = radiative feedback parameter (ie sensitivity)

          A quick search tells me deltaT (change in temperature) has apparently varied substantially (at least 2-fold) by latitude.[1] My understanding is that deltaF (change in radiative forcing) should be proportional to the 4th power of deltaT.[2] I don’t see why they are using a model with a linear relationship between deltaF and deltaT. How is this linear relationship arrived at and why do they not calculate different values for each latitude?

          [1] http://cdiac.ornl.gov/trends/temp/hansen/three_hemispheres_trend.html
          [2] https://en.wikipedia.org/wiki/Stefan%E2%80%93Boltzmann_law

        • Indeed, one way you can think of this is that the flux is sigma T^4, but if dT is small, then you can expand this as dF = 4 sigma T^3 dT, so – for small changes in dT – the response is approximately linear in dT. Starting at the bottom of page 433 of this is a discussion involving Isaac Held in which he explains why we have plenty of evidence that the system does appear to respond linearly.

        • The variation of 2 fold by latitude is I think irrelevant as the temperature being discussed is the global average, so the model is already taking into account the variation from place to place on the earth by averaging the effect over all the different locations. It’s only the change *in the average* which is accounted for in the model, and that change is small and so linear taylor series is probably appropriate.

        • Thanks, it looks like you are talking about the derivation here, but I felt they omitted some steps:
          http://www.acs.org/content/acs/en/climatescience/atmosphericwarming/climatsensitivity.html

          In step one they seem vulnerable to the error discussed in this paper. S_ave is not really related to the surface temperature in that way:

          “Because this emissions scales as T^4, it can be demonstrated that the spatial average of the local equilibrium temperature (Eq. (1)) is necessarily smaller than the effective equilibrium temperature defined by Eq. (2). Rigorously, this stems from the fact that T_eq;mu is a concave function of the absorbed flux which, with the help of the Rogers-Hölder inequality (Rogers 1888; Hölder 1889), yields

          […an equation I don’t want to rewrite…]

          In other words, regions receiving twice as much flux do not need to be twice as hot to reach equilibrium and cold regions have a more important weight in the mean temperature.

          While any energy redistribution by the atmosphere will tend to lower this difference, one needs to remember that even the mean surface temperature can be much lower than the effective equilibrium temperature.

          These simple considerations show that equilibrium temperature can be misleading and should be used with great care.”
          http://arxiv.org/abs/1303.7079

          Maybe this is addressed somehow but it is not clear to me from that ACS document.

        • I don’t know the specifics, but if you imagine that radiation per unit area is dR = K T^4. and you have T for a whole bunch of locations on the earth, then R = integral(dR) = integral(KT^4, dA) across all the patches of area dA, is the total radiation.

          Calculate integral(KT^4,dA)/A = K*T*^4 for special values K* and T* (if K is constant then K* = K, otherwise let K* = 1/A integral(K dA)).

          T* is the “effective temperature” that is, the temperature for which our lumped model gives the right result. I’m pretty sure that’s the concept being used here.

        • >”Calculate integral(KT^4,dA)/A = K*T*^4 for special values K* and T* (if K is constant then K* = K, otherwise let K* = 1/A integral(K dA)).

          T* is the “effective temperature” that is, the temperature for which our lumped model gives the right result. I’m pretty sure that’s the concept being used here.”

          It seems circular to me if that is what is going on. Why would they use an equation to calculate a value T* then plug that value back into the same equation? I’d need to see all the steps in one place.

        • It’s all pretty simple, I think. If the is a change in forcing dF, then in the absence of any temperature response, and if the system was initially in energy balance, the change in forcing will produce a change in system heat uptake rate dN, such that

          dN = dF

          If the only feedback is the Planck response, then the equation becomes

          dN = dF – 4 sigma T^3 dT

          However, there are other feedbacks (water vapour, lapse rate, clouds) which depend roughly linearly on dT, so you can rewrite the above as

          dN = dF – 4 sigma T^3 dT + W_{feed} dT

          Typically, the latter two terms are combined into a single term so that

          dN = dF – lambda dT.

          Of course, these are global averages, the model is very simple, and so it isn’t some perfect representation, but it is useful. One problem – for example – is that the assumption is typically that lambda is constant, which may not be true – one reason for this being the spatial pattern of the warming, which a simple model like this can’t capture.

        • @ aTTP: So then climate sensitivity will vary by latitude. For the first term of lambda I get:

          lambda1_equator = 4*5.67e-8*295^3 = 5.82
          lambda1_poles = 4*5.67e-8*273^3 = 4.61

          Qualitatively, this agrees with increasing dT away from the equator since dT= (dF- dN)/lambda

        • No, not really, because climate sensitivity is defined as a globally averaged quantity. Of course, how much we warm in reality will vary with latitude and be much more complex than a simple energy balance model can possibly indicate. That’s why we end up using much more complex models (like Global Circulation Models) but these basic energy balance models do still provide useful information.

          Also, I realise that I made a mistake when I did the above. What’s actually relevant is the TOA flux, not the surface flux. So really the Planck response should be

          4 epsilon sigma T^3 dT,

          where epsilon is some term that takes into account the composition of the atmosphere (the Greenhouse effect). Given a non-Greenhouse average temperature of 255K, and an average surface temperature of 288K, epsilon is about 0.61 (the ratio 255^4/288^4).

        • When I say “calculate the integral” what I mean is *conceptually*.

          Conceptually, there exists a K* and a T* such that K* T*^4 = integral(K T^4 dA) and K* = 1/A integral(K dA).

          We’re modeling the way the WHOLE thing works (ie. the integral) via the same functional form, but with values of the K and T that are whatever they need to be to get the thing to work… this means T* is NOT an observed quantity, though with enough observations of K the K* could be calculated directly, the T* can not.

          in some ways this is closely related to the “renormalization group” but saying that doesn’t make anything easier to understand necessarily.

        • clarification, with enough observations of K and T at different places you can calculate both K* and T* but you can’t calculate T* by just having observations of T.

          You can, potentially, introduce a statistical model such that T* = f(T_1,T_2,T_3…) + model_and_measurement_noise. and one way you might do that is to average T_1,T_2,…. and then express the model in terms of the average

          T* = + Correction_Factor()

          and you might then, for small perturbations from a typical initial average assume the correction factor is linear…

          T* = T_init + lambda * ( – T_init)

          and etc etc … sometimes physicists take things for granted that outsiders might not recognize.

          The “global net average” model is essentially treating the entire earth as a point and using the equations that would be used to model say a cubic centimeter of atmosphere. Can that work? yes, but you have to remember to use a thermometer about the size of the earth when you take the temperature ;-)

        • >”The “global net average” model is essentially treating the entire earth as a point and using the equations that would be used to model say a cubic centimeter of atmosphere.”

          Yes, this is what concerns me. The 1-D model does not account for the fact that the surface of the earth curves away from the energy source. If R0 of radiation are hitting the earth at the place closest to the sun, call it coordinates [lat0,long0], then as we move in any direction this will decrease to R=R0*cos(lat-lat0)*cos(long-long0).

          The surface temperature is measured locally and then averaged together. So if we wish to compare this to the output of the equation, what we want is T= mean((K*R)^0.25). Instead this model uses T= mean(K*R)^0.25, which is an upper bound on the former that occurs when R1=R2=…=Rn.

          Please check the explanation in the paper. However, it is still not clear to me whether this creates a problem when calculating the climate sensitivity, because that is dealing with differences.

        • Yes, this is what concerns me. The 1-D model does not account for the fact that the surface of the earth curves away from the energy source.
          Except, they’re really energy balance models. They’re treating the whole planet as a single system and estimating how temperatures – on average – change if the energy balance changes. Clearly there is much they can’t do, but as a simple approximation, they can be quite useful.

        • > Except, they’re really energy balance models. They’re treating the whole planet as a single system and estimating how temperatures – on average – change if the energy balance changes. Clearly there is much they can’t do, but as a simple approximation, they can be quite useful.

          Yes. Actually, I think of dT=lambda*dF as a prior. We don’t live on a uniform planet but average characteristics can provide useful information. Change in average surface temperature is a useful proxy or predictor of other phenomena. (For example, given that diurnal fluctuatons are on the order of 10 C I probably wouldn’t care about a 1 C change in temp if it were +1 C 24/7/365 but that’s not what’s going on. The 1 C change in avg temp is a proxy for increases in extreme weather event, drought, etc which I do care about.) Is it useful to think of dT=lambda*dF as a prior which we use to assess the plausibility of GCM outputs? (I don’t know how I’d constrain a GCM based on a dT=lambda*dF prior but I could assess plausibility of GCM output presuming a pdf for lambda*dF.)

        • >”Clearly there is much they can’t do, but as a simple approximation, they can be quite useful.”

          Fair enough. However, as shown in the paper linked above, T_3d~0.57*T_1d, using the same overall reasoning. The T_ave = 255 K value is off by ~100 K just due to model simplifications. That is a pretty big difference just due to going from one dimensional to three dimensional, it is probably worth using the more realistic latter model. There is such a thing as “too simplified”.

          But, once again, this may not matter in the case of the climate sensitivity since that is dealing with differences in forcings/temperatures rather than the actual values. How different are estimates of climate sensitivity using a 3D model?

        • The figure in this comment shows almost all the different types of estimates, including energy balance-type estimates (Instrumental), models, paleoclimate, and some attempts to combine estimates. In general, there’s a pretty large overlap. Some tend to be lower than others, but the generally accepted likely range (66%) is about 1.5C – 4.5C. Some argue that the lower end should be a bit higher (2C) and some argue that the higher end should be a bit lower (4C), but a likely range of 1.5C – 4.5C seems pretty reasonable.

        • >”The figure in this comment shows almost all the different types of estimates”

          This doesn’t show how the estimates are arrived at though. If they all use the same 1D model in one way or another they can be consistent but wrong due to the same (possibly) flawed calculation.

        • No, only the Instrumental ones use basic energy balance, and even some of them may be somewhat more sophisticated. Paleoclimatology estimates are based on analyses of ice cores and other paleo indicators that are used to estimate temperature changes and forcing changes. The climate model results are from 3D global circulation models. Essentially, there are a number of different methods that are broadly consistent.

        • >”No, only the Instrumental ones use basic energy balance, and even some of them may be somewhat more sophisticated. Paleoclimatology estimates are based on analyses of ice cores and other paleo indicators that are used to estimate temperature changes and forcing changes. The climate model results are from 3D global circulation models.”

          I checked one GCM paper (Andrews et al 2012) and a presentation about a Paleo paper (Schmittner et al 2012):

          “N=F -alphaΔT”
          http://onlinelibrary.wiley.com/doi/10.1029/2012GL051607/abstract

          “CS = ΔT/ΔF”
          http://cicar.ei.columbia.edu/sitefiles/file/Schmittner-sensitivity.pdf

          Both use the equations apparently derived from the 1D model to calculate climate sensitivity. Can these equations also be derived from a 3D model as described above?

        • Climate sensitivity is defined in terms of global averages (there is only one number) but a GCM is fully time-dependent, three-dimensional simulation that typically includes atmospheric and ocean processes. So, to determine the ECS from a GCM, typically the atmospheric CO2 is increased at 1% per year until it has doubled, and then you let the simulation continue running until equilibrium is attained. The resulting temperature change is the ECS. For paleo estimates, many different proxies are used to estimate the change in global average temperature and the change in external forcing, from which one can estimate the ECS. You seem to be confusing the fact that one really has to use globally averaged quantities to determine the ECS (because it is – by definition – a globally averaged quantity) with models that are only 1D.

        • >” You seem to be confusing the fact that one really has to use globally averaged quantities to determine the ECS (because it is – by definition – a globally averaged quantity) with models that are only 1D.”

          The real world temperatures are measured in a situation closer to the 3D model. IE they say this much CO2 will raise the average of the real temperature X amount leading to Y consequences. Calculating climate sensitivity for a 1-D earth may or may not give very misleading results here.

          If you know of anyone who has used the 3D model described in the paper I linked above (or a more complex one) to estimate this climate sensitivity parameter, please just share the link so I can see what they do and how they derive it. Note: I do not mean take averages of 3D GCM output and then plug into the 1D model.

        • Note: I do not mean take averages of 3D GCM output and then plug into the 1D model.
          That isn’t what they typically do. In the figure in that comment, the ECS from climate models is determined by doubling CO2 in the model and letting the model run to equilibrium. The ECS is then the new globally averaged temperature. Even in the Sherwood paper that you quote above they do this, I think. What I think they’re doing is using the 1D formalism to then determine the forcings and feedbacks, not the ECS specifically.

        • >”That isn’t what they typically do. In the figure in that comment, the ECS from climate models is determined by doubling CO2 in the model and letting the model run to equilibrium.”

          I just picked a model from that figure (in the comment below what you linked actually), searched CIMP5 climate sensitivity, and clicked a paper that came up. In that paper they use the 1D model to calculate climate sensitivity from averages of CIMP5 output. The figure in that comment does not reference a particular paper or person for the GCM-based estimates, but the spot check I did disagrees with what you are saying. Do you have a reference to support your claim?

          Anyway, unless either of us finds a 3D derivation published somewhere, or does it ourselves, I don’t see continuing this thread as being productive. Ensuring that the sensitivity calculated for a 1D earth is similar to that of 3D seems like quite an important issue though.

        • The “climate sensitivity” is a fiction, it’s a useful fiction, but a fiction nonetheless. It’s kind of like the “calorie sensitivity” of your weight. If you eat 1% more calories what will it do to your equilibrium weight? Well, it depends a lot on a lot of things! If you measure your weight with a scale capable of detecting 0.01 gram differences, you’ll see differences in your weight all day long as you sweat, drink water, cut your hair, cut your fingernails, eat a snack, use the restroom…

          The fact that when you cut your hair you lose mass on your head isn’t a problem though for defining your weight because “your weight” is a constant times the sum of the mass of all the little bits of your body. The bigger problem is in a sense defining “what is your body?” at what point is the food you eat part of your body? When it enters your mouth? leaves your stomach? Enters your bloodstream? Is incorporated inside a cell membrane?

          A 3D GCM calculates millions of temperatures across many many locations, a 1D point-mass model calculates one. The second is a statistical model for a statistic of the first.

          One way to calibrate the 1D model is to simply run one of the 3D models and calculate what the difference is before-vs-after. Then if you plug this number into the 1D model, the 1D and the 3D model will predict the same thing for that particular scenario. That’s all.

        • Ensuring that the sensitivity calculated for a 1D earth is similar to that of 3D seems like quite an important issue though.
          Climate sensitivity is a single number. Whether you calculate it using a 1D model or a 3D model, what you determine is a single number that represent climate sensitivity for the whole planet. Now consider Figure 1 in the Sherwood paper to which you linked. It shows net top-of-the-atmosphere radiative flux on the y-axis and temperature on the x-axis. It describes them as

          Relationships between the change in net top-of-atmosphere radiative flux,N, and global-mean surface-air-temperature change, ΔT, after an instantaneous quadrupling of CO2. Data points are global-annual-means.

          In other words, these are 3D global simulations from which globally averaged TOA fluxes and temperatures are determined, which are then used to determine the climate sensitivity. So, this is climate sensitivity for a 3D earth. It’s simply that to actually determine it requires globally averaging the different quantities. Given that climate sensitivity is – by definition – a single number for the entire planet, there isn’t really any other way to do this.

        • >”Climate sensitivity is a single number.”

          Yes, that is fine. My question is whether this number is substantially different when arrived at by averaging climate sensitivity over a 3D surface or calculating climate sensitivity for a 1D average surface. All examples I have seen use the latter 1D approach, I would like to see what happens using the former 3D one.

          Also, I have no idea what you mean by “Sherwood” paper but it seems to refer to Andrews et al 2002.

          >”One way to calibrate the 1D model is to simply run one of the 3D models and calculate what the difference is before-vs-after.”

          So what happens if we calculate dT, dN, and dF at every gridpoint of the model, use that to solve for climate sensitivity and then take the average to have a global climate sensitivity number? Will this answer be the same? That is what I want a reference or calculation regarding.

        • Also, I have no idea what you mean by “Sherwood” paper but it seems to refer to Andrews et al 2002.
          Sorry, yes, I meant Andrews.

          So what happens if we calculate dT, dN, and dF at every gridpoint of the model, use that to solve for climate sensitivity and then take the average to have a global climate sensitivity number? Will this answer be the same? That is what I want a reference or calculation regarding.
          In a sense this what they’re doing, because they’re averaging dT, dN and dF, to then calculate the ECS. I think you can’t calucalte ECS locally and then average that because – in a 3D model – there is circulation. So, the dN and dF are TOA values, while dT is on the suface and so there is no trivial mapping between a local TOA dF and dN, and a local surface dT.

        • >”there is no trivial mapping between a local TOA dF and dN, and a local surface dT.”

          I don’t think that’s it. There are papers were they make maps of “the local feedback parameter lambda (W m^-2 K^-11)”. See fig 5 here:
          http://journals.ametsoc.org/doi/full/10.1175/JCLI3613.1

          Some initial experimenting with a simple model of a no-atmosphere planet in synchronous orbit lead me to believe the method of calculation can make quite a difference in deltaT/deltaF. Averaging over a sphere is turning out to be fraught with pitfalls though, so I would love to seem some previous work on this.

          If the two methods do lead to different estimates of climate sensitivity, I find it difficult to believe that the 1D model is more appropriate than 3D to making claims about how much the real average temperature will rise due to a given influence.

    • Rahul:

      I would replace the word “prior” in your comment by “model.” I want to avoid the all-too-common attitude of blithe acceptance of whatever data model or likelihood happens to be in use.

  4. Am I right to interpret “nor is there any reason to think the Jeffreys prior is a good idea in any particular example” to mean that Jeffrey’s prior should not be used?

    • Probably not, but you need to consider the specifics of each case before using. In this case the word that Jeffries assigned to his prior as objecive is misleading if you use objective in it’s common sense. A better description might be ignorant but that is used for the uniform prior.

      There is also an argument to be had that a prior that does not take into account what is known about the example should not be used in a Bayesean analysis. Where you can get an argument is on what is known about the example in which case the Jeffries prior might be looked at as a decider or a variety of priors that incorporate more/less of prior knowledge.. See the Annan and Hargreaves article linked above

      ATTP’s argument is that the Jeffrey’s prior is not physical, that it overweights an outcome that is known to be impossible. The counter argument is that the weighting function takes care of that, however, that would only be completely the case if the cutoff of the weighting function were infinitely sharp because the prior still overweights regions close to the origin, not only at the origin.

  5. I think there’s a lot to recommend E. T. Jaynes’s maximum-entropy approach to constructing priors, when it’s computationally feasible. The steps are these:

    (1) Construct an appropriate ignorance prior m(x) via symmetries in one’s state of information.

    (2) Express your prior information as a collection of expected values E[f_i(x)] = c_i, 1 <= i <= n.

    (3) Use as your prior the maximum-entropy probability distribution satisfying the constraints of (2), using m(x) from (1) as the invariant measure in defining the entropy. (This is often called the entropy relative to m(x).) This distribution is in the exponential family, having the form

    (1 / Z(theta)) * m(x) * exp(SUM(i: 1 <= i <= n: theta_i * f_i(x))

    where Z(theta) is the normalization constant.

    The downside to this approach is that finding the parameter vector theta can be computationally difficult, and if you need the normalizing constant Z(theta), that's often intractably difficult to compute.

  6. One wouldn’t use a prior that includes negative values for a variance term for obvious reasons. Or for that matter, it wouldn’t make sense to use a prior that includes negative values if you are modeling the number of students per school. Don’t know if this is the same but the issue here doesn’t seem to be objective vs. subjective.

Leave a Reply

Your email address will not be published. Required fields are marked *