Progress! (on the understanding of the role of randomization in Bayesian inference)

Posted on June 14, 2013 10:57 AM by Andrew

Leading theoretical statistician Larry Wassserman in 2008:

Some of the greatest contributions of statistics to science involve adding additional randomness and leveraging that randomness. Examples are randomized experiments, permutation tests, cross-validation and data-splitting. These are unabashedly frequentist ideas and, while one can strain to fit them into a Bayesian framework, they don’t really have a place in Bayesian inference. The fact that Bayesian methods do not naturally accommodate such a powerful set of statistical ideas seems like a serious deficiency.

To which I responded on the second-to-last paragraph of page 8 here.

Larry Wasserman in 2013:

Some people say that there is no role for randomization in Bayesian inference. In other words, the randomization mechanism plays no role in Bayes’ theorem. But this is not really true. Without randomization, we can indeed derive a posterior for theta but it is highly sensitive to the prior. This is just a restatement of the non-identifiability of theta. With randomization, the posterior is much less sensitive to the prior. And I think most practical Bayesians would consider it valuable to increase the robustness of the posterior.

Exactly! I completely agree with 2013 Larry (and it’s what we say in our Bayesian book, following the ideas of Rubin and others).

I’m happy to see this development. Much of my recent work has involved Bayesian analysis of sample surveys. And, indeed, our models typically assume simple random sampling within poststratification cells. Such models are never correct (even if the survey is conducted by a probability sampling design, nonresponse will not be random) but it’s a useful starting point that we try to approximate in many of our designs. In other settings, we simply don’t have random sampling or random assignment, and then, indeed, our inferences can be more sensitive to our assumptions. The only place I’d disagree with Larry is when he writes “sensitive to the prior,” I’d say, “sensitive to the model,” because the data model comes into play too, not just the prior distribution (that is, the model for the parameters).

P.S. Beyond appreciating Larry’s recognition of this particular issue, I find his larger point interesting, that we add noise in different ways to achieve robustness or computational efficiency.

56 thoughts on “Progress! (on the understanding of the role of randomization in Bayesian inference)”

K? O'Rourke on June 14, 2013 12:11 PM at 12:11 pm said:

> “sensitive to the prior,” I’d say, “sensitive to the model,”

But here, say when you only find out about the outcomes for the groups to be compared, a multidimensional informative prior that adequately reflects the imbalances between the groups but has no way to be updated, would be all there is (i.e. no data model at all for confounding)?

Remember being very dismayed reading papers (1980 and 90,s) by well know Bayesians arguing randomization did not play any part.
bc on June 14, 2013 1:38 PM at 1:38 pm said:

I’m very curious about your comments on “sensitivity to the prior” vs “sensitivity to the model.” I take it that you see the prior as the “model for the parameters”, and the model as the prior plus the “data model.” Can you please elaborate on the difference between the model for the parameter and the data model? And in turn, between the model and the prior? I want a better understanding of the concepts at play here. Thanks!
- Daniel Lakeland on June 14, 2013 2:02 PM at 2:02 pm said:
  
  This plays into my post on “Where do Likelihoods Come From” from several years back: http://models.street-artists.org/2011/12/13/mommy-where-do-likelihoods-come-from/
  
  I still have basically the same issue, and in fact these days I’m fitting a data timeseries to an ODE and I have to decide what the likelihood of the observed error is. Independent random errors within a single timeseries is clearly wrong, when things are close together in time they tend to have similar errors. Gaussian process errors could be more right, but there’s no reason to think the errors are really gaussian, in fact some whole timeseries don’t fit well because something occurred which isn’t modeled by my ODE. Robust errors, like a t distribution can help me by emphasizing the fit of the timeseries that fit well, and “ignoring” the cases where “something else happened”. But independent t errors aren’t right, and a t-process, if such a thing exists, is out of the question from a practical perspective. I’ve settled on what I’m calling “tempered independent t errors” I have maybe 80 observations, but more observations close together don’t really improve the information about the fit, since the errors are necessarily similar when the time values are similar. So I’m taking the sum of the log of the t distributions and dividing by the number of observations, and multiplying by a fixed number per cycle times the number of cycles. Essentially saying that there are effectively 3 or 4 independent errors per cycle regardless of how many time points I have in a cycle. It has no probabilistic justification really, but it leads to answers that make sense, in a similar way that independent random errors or AR(1) errors can often lead to good results for timeseries even though we know they are not really independent or AR(1).
  - Daniel Lakeland on June 14, 2013 2:13 PM at 2:13 pm said:
    
    All that is a long winded way of saying that sometimes in sort of “standard” simple analyses, the likelihood function is pretty obvious, but it’s extremely easy to think up situations where the likelihood function that will give you good answers in a reasonable time and computational complexity is not at all obvious.
  - Anonymous on June 14, 2013 4:39 PM at 4:39 pm said:
    
    Recently I’ve been getting interested in fitting ODE models via bayesian inference – can you point me to some seminal reviews/references on the topic?
    - Daniel Lakeland on June 14, 2013 4:49 PM at 4:49 pm said:
      
      I wish I could. I’ve been sort of stumbling along with it myself. I keep thinking of possible ways to make the whole thing work better, but I don’t have the time and funding to start implementing software to carry it all out. So for the moment I’m just fiddling around with Geyer’s “mcmc” package in R and trying to make the best of it with my specific example problem.
    - Jeremy Fox on June 14, 2013 8:59 PM at 8:59 pm said:
      
      In my own work fitting ODEs to time series data, I use the pomp package in R.
    - Jeremy Fox on June 14, 2013 9:05 PM at 9:05 pm said:
      
      For some references using pomp, from the group that developed the statistical approach and wrote the package:
      
      http://rsif.royalsocietypublishing.org/content/7/43/271
      http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoas/1239888373
      http://www.pnas.org/content/103/49/18438
    - Umberto Picchini on June 15, 2013 7:25 PM at 7:25 pm said:
      
      This is from a few days ago http://arxiv.org/abs/1306.2365
      It will probaby contain some relevant references .
  - Corey on June 15, 2013 11:31 AM at 11:31 am said:
    
    Regarding “t-processes”, you can always pass a Gaussian process through the Gaussian quantile function to obtain something I’d be naturally inclined to call a Gaussian copula process. From there, you can make the marginals whatever you want.
    
    …A quick google for “Gaussian copula process” turns up this.
    - Daniel Lakeland on June 15, 2013 11:11 PM at 11:11 pm said:
      
      This is a quite interesting idea. If you’re looking to sample such a thing that would be that. However I am not clear on how I could calculate the likelihood of some observed errors under this kind of model. I guess maybe I could take the observed value, pass it through the t quantile function, get a p value, pass this through the gaussian CDF and get gaussian samples, and then calculate the density in the vicinity of the multivariate gaussian samples and multiply by some kind of complicated jacobian and … ugh… or I could sample a lot from this gaussian copula process and get an empirical likelihood by using small bins around the measured points, but that’s hugely computationally intensive.
      
      Anyway, next time I’m thinking about modeling an unobserved parameter as a gaussian process I’ll consider the gaussian copula process though, it seems quite useful.
    - Corey on June 16, 2013 2:47 PM at 2:47 pm said:
      
      Sklar’s theorem gives the multivariate CDF of a random vector with arbitrary marginals and arbitrary copula. Take the derivative to get the multivariate PDF, finito.
    - Daniel Lakeland on June 17, 2013 2:37 PM at 2:37 pm said:
      
      Let me see if I understand this idea, I have some errors E_i which represent the difference between my ODE and my observed data, I say that these errors are distributed according to a transformation of a gaussian process X_i. This gaussian process has standard normal marginals and covariance from some covariance function. The particular transformation I use is to take each sample value and create variables that have uniform marginals U_i = pnorm(X_i), and then say E_i = qt(U_i) which says that the marginal distribution of the errors is t and this helps deal with when the result fails to fit due to unmodeled effects.
      
      To get the likelihood for observed E_i I need to interpret things in the other direction. The cumulative distribution of my E_i values is pmvnorm({qnorm(pt(E_i))}) and I take the derivative of this with respect to all the E_i values d/(dE_1 dE_2.. dE_n) which will turn out to look like dmvnorm({qnorm(pt(E_i))}) dnorm(pt(E_i)) dt(E_i) after all the chain rule stuff?? and tell R to fool around with everything to give me the logarithms of all this mess in a stable way, and voila??
      
      I will look into this. But for the moment, am I being obtuse and misreading what I’m supposed to be doing?
    - Daniel Lakeland on June 17, 2013 3:01 PM at 3:01 pm said:
      
      ack the derivative of qnorm is 1/dnorm so it should be {1/dnorm(pt(E_i)) dt(E_i)} for all the i.
    - Corey on June 17, 2013 10:04 PM at 10:04 pm said:
      
      No, you’ve got it. But start with just two E_i, not an arbitrary number. Gaussian processes are handy in that if you understand two points, you basically understand the whole thing. (OK, that’s an oversimplification, but getting the bivariate distribution down for two arbitrary points is a good first step.)
Mayo on June 14, 2013 3:05 PM at 3:05 pm said:

Thank you for quoting the passage from Wasserman (2008). But I don’t see that you really answer Wasserman’s charge that: “Some of the greatest contributions of statistics to science …are unabashedly frequentist ideas and, while one can strain to fit them into a Bayesian framework, they don’t really have a place in Bayesian inference. The fact that Bayesian methods do not naturally accommodate such a powerful set of statistical ideas seems like a serious deficiency.” The “powerful set of statistical ideas” get their rationale from affording error probability computations. You say randomization helps to warrant the model assumptions more generally. But it isn’t clear how something that makes the posterior less sensitive to the prior helps to make the prior more correct as a representation of prior belief or prior strength of evidence, or the like, in the null hypothesis (e.g., of no causal effect).
- Daniel Lakeland on June 14, 2013 4:42 PM at 4:42 pm said:
  
  In my opinion, the randomization makes the likelihood more correct, by making the likelihood more correct, the data becomes more informative, and hence the results don’t depend on the prior so strongly because the data speak more strongly than they would have if the likelihood were basically bogus.
  
  The big problem with bayesian models isn’t the prior, for the most part. in many problems, it’s usually easy to come up with some kind of moderately informative prior for parameters that people won’t choke on if they aren’t vehemently anti-bayesian to begin with. What you want is a way to make your data tell you more than you knew before, ie to make your likelihood informative. But the choice of likelihood is not always so clear cut (see some comments above). not only are there different probabilistic assumptions that could be made, but there are also different structural assumptions, such that several models of the process could be reasonable in many cases.
  
  by adding in randomness we can make the likelihood easier to specify since at least the portion of the likelihood caused by the random number generator is known exactly. This can make our likelihood more informative, and hence result in better posterior distributions.
xi'an on June 14, 2013 4:09 PM at 4:09 pm said:

I am not sure I understand this notion of randomization giving robustness to the posterior. If we want to get rid of the prior influence, what’s the point in doing a Bayesian analysis in the first place?!
- K? O'Rourke on June 14, 2013 4:53 PM at 4:53 pm said:
  
  But I don’t think you want to put an informative prior on the imbalances between groups compared – especially since there will unlikely be much more that a flat likelihood to update it, when this can be avoided if there is randomisation.
  
  Early papers and comments by Don Rubin were clear about this – e.g. Rubin DB 1978 Bayesian inference for causal efects. The role of randomization. Anals of Statistics.
- george on June 14, 2013 7:48 PM at 7:48 pm said:
  
  What’s the point? Being Bayesian allows us to express what’s known about the underlying parameters via (posterior) probability distributions. This isn’t available unless one is at least approximately Bayesian.
  
  Surely Bayesian analyses don’t *have* to have prior sensitivity in order to be useful – as you seem to suggest?
- Mayo on June 14, 2013 7:50 PM at 7:50 pm said:
  
  xi’an: exactly what some of us have been saying for a long time.
- konrad on June 18, 2013 3:55 PM at 3:55 pm said:
  
  The point of doing a Bayesian analysis is to learn something from the data. If the posterior is the same as the prior, it means we have learned nothing from the data.
Anonymous on June 15, 2013 5:10 AM at 5:10 am said:

Christian: really? You think the billions of dollars we spend doing randomized trials
is wasted?

Why even collect data at all? Just state your prior and declare victory?

Surely the fact that randomization converts an unidentified parameter into an easily
estimated identified paramater is useful even for Bayesians?

Larry
- Mayo on June 15, 2013 6:37 PM at 6:37 pm said:
  
  Anon/Larry: I think perhaps what is missed by those who scoff at randomized trials is the point you raise about the value of rendering a parameter identifiable or estimable or the like. To some of us knowledge-seekers (it’s too tiring to find an acceptable statistical label), the fundamental role of statistical methods, as with all measurement tools in science, is to enable us to discern, by indirect and clever means, something that we could not detect directly. By indirect means, we may find out what it would be like, statistically, if the treatment had no effect. This is a standard to learn from observable data. Designing the method of data collection and modeling is central. As Fisher approximately said someplace, to bring the statistician in after the data is at hand is merely to ask him conduct a post mortem, to say what the experiment died of. But a lot of background knowledge enters into the design of RCTS, he emphasizes, to control known effects and figure out when it may not be worth further control. And then again in linking the statistical inference to substantive claims and theories. We jump in, work the standard tool, and jump back out again.
  - Anonymous on June 15, 2013 9:50 PM at 9:50 pm said:
    
    Who in the world is scoffing at randomized trials? Randomization addresses a causal problem – what is being estimated – whereas the choice of bayesianism vs. frequentism addresses how it’s estimated and what constitutes a useful measure of the uncertainty.
    
    I also think most bayesians would agree that modeling the data collection is essential. The complexities that arise in the data collection process is often a motivation for using bayesian approaches (e.g. for missing data imputation).
    - Andrew on June 16, 2013 8:23 AM at 8:23 am said:
      
      +1 on that second paragraph.
Entsophy on June 15, 2013 10:12 AM at 10:12 am said:

Larry has the same quality that Jaynes, and anyone else worthy of respect in statistics, possessed: they try as much as possible to reduce philosophical differences down to mathematical ones. That way there’s at least a fighting chance people with very strong ideas can resolve some of those differences.

The dogmatic/philosophical/refuse-to-examine-the-math approach to probability is such an obvious dead end it’s a wonder it was ever tried, let alone that it’s still being attempted.
- K? O'Rourke on June 16, 2013 11:23 AM at 11:23 am said:
  
  Agree it is the first step, but understanding representations in and of themselves is not enough to assess what is most purposeful for learning about our (empirical) world.
  
  What we want are methods for enquiring communities that facilitate them quickly and importantly getting less wrong about the world – nothing more, nothing less.
  - bxg on June 17, 2013 10:19 PM at 10:19 pm said:
    
    > try as much as possible to reduce philosophical differences down to mathematical ones
    
    I don’t get this, at least for statistics (and maybe other philosophical differences too). Surely we need to have some agreement on what the actual questions being asked are? Someone presents some data, a parameterized family of models M_theta, and a demonstration that – given the data – M_0 is rejected at such-and-such confidence level. I’m probably not going to be at all confused about, or in disagreement, with the mathematics. I may not like the model family. Or worse, I may likely be completely perplexed as to what purpose the analysis was done for or what “rejected” actually means (to which an assertion: “here’s what I define ‘rejected’ to mean in this context and here is why it is mathematically true given my definition” is deeply non-responsive.)
    
    There seem a lot of important questions that are controversial, and mutually misunderstood, even if each side concedes perfect mathematical omniscience to the other. This seems to be true in statistical philosophy a lot.
    What mathematical question is at issue (or whose resolution would be at all helpful) in Lindley’s paradox, for instance?
    
    Likely I’ve misunderstood you but… I’ve seen plenty of questions where the approach “write down a clear, formal if needed, definition of what you are actually claiming and then let’s try to prove it or find counterexamples” has been a magic bullet for settling disagreements. But I just don’t see it, at all, for these questions in statistics.
    - konrad on June 18, 2013 3:51 PM at 3:51 pm said:
      
      The point is that the Bayesian-frequentist debate is characterised by an unresolvable (at least thus far) disagreement on what the questions are. This does not prevent us from agreeing to disagree on that and making further progress by investigating whether particular methodologies are mathematically correct given the stated aims in one or the other framework.
      
      As for Lindley’s paradox, there are no (unresolved) mathematical questions at issue. From the wikipedia entry: “Although referred to as a paradox, the differing results from the Bayesian and Frequentist approaches can be explained as using them to answer fundamentally different questions, rather than actual disagreement between the two methods.” Once we accept that different people prefer to ask different questions, no problems remain.
    - bxg on June 19, 2013 9:56 PM at 9:56 pm said:
      
      > This does not prevent us from agreeing to disagree on that and making further progress by investigating whether particular methodologies are mathematically correct given the stated aims in one or the other framework
      
      Are there _any_ well-known and interesting questions in the Bayesian-frequentist debate that have
      this nature? I.e. (if I am interpreting you correctly) where someone grants another’s stated aims (and presumably definitions, etc – ?) and it boils down to whether they are doing the math correctly relative to their assumptions? I sincerely do not know of any, and would love an example.
      
      Wrt Lindsey’s paradox, the most serious problem DO remain. Suppose someone (not a statistician) comes along to a statistician and says “Help me please; I’d like to know whether theta is zero or not” (H_0 is true or not?). That’s almost certainly not what they really want to know, but, even if it is, _that_ question is NOT the question either the Bayesian or frequentist approach to the LP try to answer. We just can’t say they want the Bayesian question or the frequentist one or something else without further probing. And presumably it would be bad to just pull one of these two approaches off the shelf because that’s what the statistician being approached is familiar with. So the problem that remains is: what do we ask the client to work out what _his_ question is? This is not a matter of pure mathematics; quite the opposite.
      
      (I do wonder whether the client’s problem will ever really map, remotely plausibly, onto a hypothesis-test question, but maybe this happens – still, it’s far from a given.)
    - Andrew on June 19, 2013 11:11 PM at 11:11 pm said:
      
      Bxg:
      
      Here’s an example, sort of. Anti-Bayesian know-nothings used to go around saying that they didn’t like hierarchical models because they didn’t “believe” in exchangeability. For example, in the 8 schools (see chapter 5 of BDA), what if some of the schools in the data are much better than others, or identifiably different in some important way? What if, say, 7 of the schools are public and 1 is private? One useful way to shift the discussion is to move to some technical issues. If you have identifying information on some schools, this info can be included as regression predictors. If you think that one of the schools might be much different than the other, this suggests that it’s best not to model the 8 effects as coming from a common normal distribution; a long-tailed distribution might make more sense. The point is to get people away from arguing about exchangeability nod focusing on (potentially) measurable aspects of the data. As Rubin would say, he’d like for the scientists to be focusing on the science, not on the statistical properties of estimators. Bayesian analysis when done right can transform statistical or even philosophical questions into scientific and technological questions.
      
      P.S. I hope someone’s still reading this thread!
    - K? O'Rourke on June 20, 2013 8:00 AM at 8:00 am said:
      
      > Bayesian analysis when done right
      For “right” I would read purposefully and though we have moved forward (are now less wrong) we still need agreement on _right_, _purposefully_, etc., that are not mathematical concepts.
      
      Somewhat like Rubin’s statement, David Andrews would suggest try to discern if the client wants help with the problem or the technique (and if just the technique its likely a waste of time).
    - Entsophy on June 20, 2013 2:13 AM at 2:13 am said:
      
      quoting Bxg: “has been a magic bullet for settling disagreements”
      
      I don’t know that anyone claims it’s a magic bullet. I merely claimed when it’s possible, it gives a fighting chance of reducing some of the differences.
      
      Frequentists and bayesians agree on more specific points than in the past and that progress has come from those few cases where philosophical points could be reduced to mathematical ones. Many big differences remain though, and future progress will likely come in the way described by Max Plank: “Science advances one funeral at a time”
    - Corey on June 20, 2013 7:44 AM at 7:44 am said:
      
      I think optional stopping is at or near the core of the frequentist-Bayesian divide, and it’s eminently amenable to being reduced to math. (I’ve been poking at an optional stopping toy problem for a while now.)
    - K? O'Rourke on June 20, 2013 7:54 AM at 7:54 am said:
      
      I agree with Bxg: its low hanging fruit and with you it would be a shame not to pick it.
    - Andrew on June 20, 2013 8:43 AM at 8:43 am said:
      
      Corey:
      
      We discuss stopping rules a bit in chapters 6 and 7 of BDA (chapters 6 and 8 of the forthcoming third edition). The paradox is often presented that the stopping rule evidently makes a difference, but in Bayesian inference it doesn’t seem to matter. We resolve this in two ways: First, we point out that the stopping rule does matter in model checking because it affects the predictive distribution and thus affects the hypothetical replications to which the data will be compared. Second, we point out that the stopping rule does make a difference in Bayesian inference if the model is changing over time.
    - Entsophy on June 20, 2013 9:11 AM at 9:11 am said:
      
      Corey,
      
      I had noticed that and had been curious where you were you had taken it. I’ve long since concluded the real divide is simply the definition of probability. One group thinks:
      
      Def 1: P(x) is the limiting frequency of x in repeated trials.
      Def 2: P(x) defines a region (the high probability manifold) where the true x lies.
      
      Def 1 people imagine they’re mini-physicists and are modeling physical laws of the universe. Def 2 people are essentially taking “majority votes” over the high probability region in order to make best guesses about functions of x. Def 2 is more useful and more realistic even in problems that involve frequencies in repeated trials and vastly more general to-boot. All distributions, sampling, priors, and posteriors are on exactly the same footing and are to be judged in the same ways. That’s the unbridgeable divide between the Bayesians and Frequentist; and really going forward the problem isn’t Frequentist, who are permanently lost-in-the-sauce, it’s that most Bayesians retain too much def 1 intuition from their early encounters with statistics.
      
      I don’t see the stopping rule stuff as fundamental. To cut a long story short, a likelihood L(f) comes from the function form f=F(x) where x is some deeper space (which ultimately represents in some way the unknown state of the universe). If F is known with certainty then observed values of f have the same implications for x regardless of how it was observed. If there is uncertainty about F, then things like stopping rules effect what conclusions are draw about x from the observed f. A straightforward Bayesian analysis of the case when there is uncertainty about the form of F takes care of the situation. Or in some cases it’s simply that the wrong F is being used. I had the impression this had been worked out well enough for most practical purposes in Gelman’s book.
    - K? O'Rourke on June 20, 2013 10:12 AM at 10:12 am said:
      
      Corey:
      
      Jim Berger did tell me in 2008 that the stopping rule stuff was what convinced him to become a Bayesian. There was not the opportunity to fully discuss why so I can’t (but you are not the only one).
      
      Entsophy: Believe a large amount of the divide you point comes from confusion over what’s being represented with the representation (e.g. thinking randomization does not matter in Bayes as it does not explicitly appear in Bayes Theorem.) I do like your majority vote description.
    - Corey on June 20, 2013 12:55 PM at 12:55 pm said:
      
      Andrew and Entsophy: I’m abstracting away most of the practical issues, economics-style, to maintain a tight focus on the irreconcilable differences. My toy problem is: collect one sample from a Gaussian with unknown mean and variance 1; If it’s within some interval that’s symmetric about 0, collect another data point with the same mean and variance sigma^2.
      
      This model is an abstraction of a general optional stopping situation with one decision point, arbitrary initial sample size and arbitrary optional follow-up sample size. It also doesn’t take us too far away from the fixed sample-size case, where the correspondence/agreement between Bayesian and frequentist inference is well-understood.
      
      I’m considering a comparison of confidence intervals vs. credible intervals. I got bogged down looking for a confidence interval procedure that was optimal according to some criterion (instead of one that was hacked together out of inequalities and that ignores some of the data, per the standard literature on the subject).
    - konrad on June 20, 2013 7:12 PM at 7:12 pm said:
      
      bxg, regarding Lindley’s paradox: But that question _is_ directly answerable in the Bayesian framework – by calculating the probability of H_0 being true. (Unless the client just wanted a yes/no answer with no uncertainty involved – in that case, refer them to a theologian.)
      
      Of course, like any problem in statistics or applied mathematics, one cannot get an answer without first supplying a sufficiently complete model, and the answer cannot be expected to correspond to reality except to the extent that the model does. So there is still some more work to do before the Bayesian mechanism can come into play.
      
      And of course if the client has not yet decided what the question is, it is not yet time to select a methodology for answering it. (Granted, most of our effort in science goes into formulating questions rather than answering them, and availability of methodology does affect the questions we choose.)
    - bxg on June 20, 2013 9:06 PM at 9:06 pm said:
      
      konrad, if I were to agree with you (at the meta-level I somewhat do), what space does that leave for the frequentist answer to LP? Part of this discussion is about the legitimacy of “different questions” so if Bayes gets to take the “is theta 0?” question, what is the other question (the frequentist one) good for and who cares? You either have a diplomatic answer to that or re-raise the philosophical dispute that people are trying to bottle up.
      
      In the specifics, I actually disagree with you that Bayes is going to be useful in dealing with a client whose true question, really is to discover whether theta is 0 (truly, absolutely, no noise, equal to zero).
      
      If someone asks me whether theta is truly 0 or not, then I’m going to search for a domain-specific argument why it is almost certainly not so. And for most such questions (certainly any in the social sciences, but much more broadly than that) I’ll quickly find such an argument, and furthermore the argument will be so strong that no reasonable amount of data will shake it. So I didn’t need any statistics to satisfy that client. And if you had data I would (rightly!) ignore it.
      But maybe we have another problem where theta truly= 0 is a live possibility. Then I’ll first approach it as a mathematician or logician, trying to deduce a proof of such or find a counterexample. No data, no statistics either.
      So, yes, there are some cases (I think rare) where theta truly=0 is a live possibility, we can’t deduce our way to truth of falsity, and sampling/statistics actually might help. I’ll go out on a limb and speculate that these are going to be very special situations, i.e. where we think sampled data can actually assist in deciding the truth of such a precise claim, so that the right way of using this data is going to be very idiosyncratic. Bayes? Something entirely new? Who can guess in the abstract?
    - konrad on June 21, 2013 3:32 PM at 3:32 pm said:
      
      @bxg: I agree with all of that. I was assuming we are talking about an example where the client really does have a nonzero prior for H_0; I might point out that some nuance is possible, e.g. the client might be thinking of =0 as meaning “within a distance epsilon from 0”, where they have a principled argument for choosing epsilon. In that case a continuous model will do the job without point priors.
      
      Regarding the frequentist question, a diplomatic answer would be that many people do find it compelling. Since I personally do not, I’ll not try to defend it myself (without denying that a valid defense could be offered in principle).
Mayo on June 16, 2013 2:04 AM at 2:04 am said:

Anonymous: I was merely alluding to the standpoints behind (a) Gelman’s citation of Wasserman:
“Some of the greatest contributions of statistics to science involve adding additional randomness and leveraging that randomness. Examples are randomized experiments, permutation tests, cross-validation and data-splitting. These are unabashedly frequentist ideas and, while one can strain to fit them into a Bayesian framework, they don’t really have a place in Bayesian inference. The fact that Bayesian methods do not naturally accommodate such a powerful set of statistical ideas seems like a serious deficiency”.

and (b) the previous remark:

“Christian: really? You think the billions of dollars we spend doing randomized trials
is wasted?”

I don’t really understand the “what” vs “how” distinction here:
“Randomization addresses a causal problem – what is being estimated – whereas the choice of bayesianism vs. frequentism addresses how it’s estimated.”

If there is no dispute on the “what” then is the issue whether Wasserman is correct about the “how”?
- Daniel Lakeland on June 16, 2013 10:44 AM at 10:44 am said:
  
  Yes I think the issue is with the how, a bayesian model without randomization in assignment of treatments is being estimated using a likelihood which is going to be much less informative. That’s basically what Wasserman v. 2013 has acknowledged. In the absence of the randomization you will need to *know* all the confounding factors and how they enter into the outcomes in order to have the same informative likelihood.
  - K? O'Rourke on June 16, 2013 11:19 AM at 11:19 am said:
    
    Daniel: One can try to use a prior (instead of a likelihood) as in the Rubin reference I gave (e.g. Sander Greenlands multiple bias analysis)
    - Shira on June 16, 2013 3:34 PM at 3:34 pm said:
      
      Are you referring to the discussion from the bottom of p.51 to top of p.52 in
      
      http://www.stat.cmu.edu/~fienberg/Statistics36-756/bayesiancausality.pdf
      
      ?
      
      Thanks!
      Shira
Pingback: Gaussian Copula Process errors for ODE models | Models Of Reality
- Andrew on June 18, 2013 12:47 PM at 12:47 pm said:
  
  You should be able to fit this model in Stan.
  - Daniel Lakeland on June 18, 2013 1:30 PM at 1:30 pm said:
    
    I would love to, but at each iteration I need to run an ODE solver for each timeseries to get the predicted values for that timeseries. I don’t think Stan can do this can it?
    - Andrew on June 18, 2013 4:09 PM at 4:09 pm said:
      
      Aaahhhh, yes, we plan to implement this but it doesn’t yet in Stan.
    - Daniel Lakeland on June 18, 2013 5:58 PM at 5:58 pm said:
      
      It’s not clear to me how Stan could do this and still use NUTS, wouldn’t you have to be able to calculate the derivative of the ODE solver’s highly multivariate output with respect to all the parameters? In any case, it would be quite wonderful to be able to fit ODEs to timeseries data using Stan so I am excited to hear that this is something you’re considering.
alex on June 18, 2013 5:37 PM at 5:37 pm said:

The Bayesian defence seems pretty unprincipled to be. If you’re not confident about your prior then randomisation reduces prior sensitivity, and if you’re not confident about your ability to model data collection then randomisation can give you an ignorable model. But that kicks off by assuming that you’re not a perfect statistician and are compensating for weakness. The perfect bayesian would be sure about the prior and able to model data collection – and wouldn’t need randomisation.

It just seems the Bayesian defence is from weakness “we’re not perfect but this compensates” while the Fisherian defence is from strength “the best way of analysis is through randomisation”. Very different perspectives.
- Andrew on June 18, 2013 6:44 PM at 6:44 pm said:
  
  Alex:
  
  I can’t speak for others, but the discussion of randomization in BDA follows Bayesian principles. I think your remark, “If you’re not confident about your prior,” is a red herring. First, a model is a set of assumptions. We go with what assumptions we have, but we typically know they’re wrong. Second, the “prior” is only part of the model, and I strenuously object those statisticians who unquestioningly accept whatever data model comes to them but then balk at a probability model for the parameters.
- konrad on June 18, 2013 8:19 PM at 8:19 pm said:
  
  To add to Andrew’s comments and paraphrase what Daniel Lakeland said above: the purpose of randomization is to increase the amount of (useful) information in the data set being collected. That’s an argument from strength, not weakness.
  
  Also, the absence of a good data collection model and the need to circumvent this is an intrinsic part of the problem, not of the proposed solution (whether frequentist or Bayesian).
phayes on June 22, 2013 1:23 AM at 1:23 am said:

“Leading theoretical statistician Larry Wassserman”

Who also wrote this http://normaldeviate.wordpress.com/2012/12/08/flat-priors-in-flatland-stones-paradox/ ?!

Comments are closed.