Let’s stop talking about published research findings being true or false

Posted on June 29, 2017 9:47 AM by Andrew

I bear some of the blame for this.

When I heard about John Ioannidis’s paper, “Why Most Published Research Findings Are False,” I thought it was cool. Ioannidis was on the same side as me, and Uri Simonsohn, and Greg Francis, and Paul Meehl, in the replication debate: he felt that there was a lot of bad work out there, supported by meaningless p-values, and his paper was a demonstration of how this could come to pass, how it was that the seemingly-strong evidence of “p less than .05” wasn’t so strong at all.

I didn’t (and don’t) quite buy Ioannidis’s mathematical framing of the problem, in which published findings map to hypotheses that are “true” or “false.” I don’t buy it for two reasons: First, statistical claims are only loosely linked to scientific hypotheses. What, for example, is the hypothesis of Satoshi Kanazawa? Is it that sex ratios of babies are not identical among all groups? Or that we should believe in “evolutionary psychology”? Or that strong powerful men are more likely to have boys, in all circumstances? Some circumstances? Etc. Similarly with that ovulation-and-clothing paper: is the hypothesis that women are more likely to wear red clothing during their most fertile days? Or during days 6-14 (which are not the most fertile days of the cycle)? Or only on warm days? Etc. The second problem is that the null hypotheses being tested and rejected are typically point nulls—the model of zero difference, which is just about always false. So the alternative hypothesis is just about always true. But the alternative to the null is not what is being specified in the paper. And, as Bargh etc. have demonstrated, the hypothesis can keep shifting. So we go round and round.

Here’s my point. Whether you think the experiments and observational studies of Kanazawa, Bargh, etc., are worth doing, or whether you think they’re a waste of time: either way, I don’t think they’re making claims that can be said to be either “true” or “false.” And I feel the same way about medical studies of the “hormone therapy causes cancer” variety. It could be possible to coerce these claims into specific predictions about measurable quantities, but that’s not what these papers are doing.

I agree that there are true and false statements. For example, “the Stroop effect is real and it’s spectacular” is true. But when you move away from these super-clear examples, it’s tougher. Does power pose have real effects? Sure, everything you do will have some effect. But that’s not quite what Ioannidis was talking about, I guess.

Anyway, I’m still glad that Ioannidis wrote that paper, and I agree with his main point, even if I feel it was awkwardly expressed by being crammed into the true-positive, false-positive framework.

But it’s been 12 years now, and it’s time to move on. Back in 2013, I was not so pleased with Jager and Leek’s paper, “Empirical estimates suggest most published medical research is true.” Studying the statistical properties published scientific claims, that’s great. Doing it in the true-or-false framework, not so much.

I can understand Jager and Leek’s frustration: Ioannidis used this framework to write a much celebrated paper; Jager and Leek do something similar—but with real data!—and get all this skepticism. But I do think we have to move on.

And I feel the same way about this new paper, “Too True to be Bad: When Sets of Studies With Significant and Nonsignificant Findings Are Probably True,” by Daniel Lakens and Alexander Etz, sent to me by Kevin Lewis. I suppose such analyses are helpful for people to build their understanding, but I think the whole true/false thing with social science hypotheses is just pointless. These people are working within an old-fashioned paradigm, and I wish they’d take the lead from my 2014 paper with Carlin on Type M and S errors. I suspect that I would agree with the recommendations of this paper (as, indeed, I agree with Ioannidis), but at this point I’ve just lost the patience for decoding this sort of argument and reframing it in terms of continuous and varying effects. That said, I expect this paper by Lakens and Etz, like the earlier papers by Ioannidis and Jager/Leek, could be useful, as I recognize that many people are still comfortable working within the outmoded framework of true and false hypotheses.

P.S. More here and here.

119 thoughts on “Let’s stop talking about published research findings being true or false”

Jon Baron on June 29, 2017 11:51 AM at 11:51 am said:

Andrew,

You keep saying that the null hypothesis is (usually, just about always, etc.) false. I agree with this for most research I’ve seen in political science or psychometrics (personality tests, IQ tests, self-report tests, etc.). However, in experimental psychology and other experimental sciences, we sometimes go to great lengths to make sure that the null hypothesis is exactly true if an effect isn’t there. Admittedly, the lengths are sometimes not great enough, but that is a practical problem of design of perfectly matched control conditions, not a conceptual problem with what we are trying to do. Thus, I get irked when I see a general statement about the null being always false. (You don’t go that far, but some people do.) It is like someone is saying that large parts of my research are hopeless. I think that what you see here depends somewhat on where you sit. And, in my sometimes unfortunate experience, the null hypothesis can indeed be true!

Reply ↓
- Andrew on June 29, 2017 12:01 PM at 12:01 pm said:
  
  Jon:
  
  One example I like to give where an effect can be zero is genetics, where a gene can be on a particular chromosome, or not. Even there I’m sure this is an approximation, but I can believe it’s close enough of an approximation to be reasonable to work with.
  
  Reply ↓
  - CuriousGeorge on June 29, 2017 12:43 PM at 12:43 pm said:
    
    There are transacting elements in genetics as well. Genes at a distance can effect each other.
    
    Anyway, I thought your argument that “all effect sizes are non-zero” was simply the one that effect size is a continuous random variate, so even if centered at zero, the probability of an effect being exactly zero is zero.
    
    Reply ↓
  - Mark W. on June 29, 2017 1:44 PM at 1:44 pm said:
    
    I’ve heard the criticism that the point-estimate zero null hypothesis is always false argument from a number of people, and I get that with an n approaching an arbitrarily large number, a p-value is basically a tautology telling you that “yes, you have a big sample!”
    
    However, what about in the case of something like ESP? So ESP is not a thing. Period. So The point-estimate zero null hypothesis IS true. If we run n = 100,000,000 people through a tightly-controlled ESP experiment, we should NOT get a small p, correct? Again, because ESP is just not a thing that exists.
    
    But it seems like you would, simply because any small deviation would be significant. If this is true, then either ESP is a thing or frequentist assumptions are patently wrong? Or am I missing something? Because, to be honest, anything that shows evidence for something supernatural existing–regardless of effect size–would be mind-blowing.
    
    I’m just not sure how to handle understanding the frequentist paradigm when it would argue that something that does not exist is a thing simply because we have an arbitrarily large n.
    
    Reply ↓
    - Andrew on June 29, 2017 2:19 PM at 2:19 pm said:
      
      Mark:
      
      As I wrote here:
      
      is if we think ESP exists at all, then an effect that’s +0.003 on Monday and -0.002 on Tuesday and +0.001 on Wednesday probably isn’t so interesting. This becomes clearer if we move the domain away from possible null phenomena such as ESP or homeopathy, to things like social priming, which presumably has some effect, but which varies so much by person and by context to be generally unpredictable and indistinguishable from noise. I don’t think ESP is such a good model for psychology research because it’s one of the few things people study that really could be zero.
      
      ESP is a weird field of research and I don’t think it’s a good model for most of science.
    - Paul Alper on June 29, 2017 2:31 PM at 2:31 pm said:
      
      Andrew: Why pick on ESP when there exists data on the Tooth Fairy?
      
      https://www.usatoday.com/story/money/2015/08/17/tooth-fairy-budget-giving-less-2015/31853399/
      
      “Losing a tooth just isn’t as lucrative as it used to be. The Tooth Fairy is leaving an average of $3.19 per tooth under kids’ pillows this year, according to an annual Tooth Fairy survey by Visa out Monday. It’s the second year in a row the amount for a lost tooth has dropped, down 24 cents from last year. In 2014, the Tooth Fairy left an average of $3.43, down 27 cents from 2013.”
      
      “The most profitable area to lose a tooth is in the Northeast, where kids get an average of $3.56 per tooth — that’s where the most kids get $5 and $20 bills, or more. The Tooth Fairy leaves an average of $3.13 in the Midwest, $3.09 in the West and $3.07 in the South.”
    - Mark W. on June 29, 2017 3:12 PM at 3:12 pm said:
      
      My question is less about psychology and more about the theory behind frequentist methods. You write:
      
      > I don’t think ESP is such a good model for psychology research because it’s one of the few things people study that really could be zero.
      
      Set aside psychology; I care about the implications this has for frequentist methods in any field.
      
      If it “*really could be zero*”, then finding an effect of .003, p < .01 is interesting, right? Then we have evidence for something supernatural happening! What I cannot reconcile in my mind is how, with a large enough n anything is significant, we can take frequentist methods seriously *at all*, given that a large enough n would say something that doesn't exist actually exists.
      
      I know it isn't practical that we would do an n = 100,000,000 randomly assigned ESP experiment, but I think it is important for considering how seriously we take inferences we are making from frequentist analyses.
    - Carlos Ungil on June 29, 2017 3:59 PM at 3:59 pm said:
      
      If the effect is small, you will get a significant p-value when the sample is large enough. But this is not true if there is no effect at all (the 0.05 p-value should not happen more than once in 20 trials).
    - Daniel Lakeland on June 29, 2017 4:15 PM at 4:15 pm said:
      
      Assuming perfect correctness of your frequency distribution model.
      
      Even something that really is a normal distribution, whose mean is corrupted by a small deterministic signal, like normal(1 + epsilon* sin(omega * t), 1) will show up as an effect depending on your duration of sampling and omega…
      
      In the real world, model misspecification seems very likely to make every enormous sample show a “significant” (statistically) effect.
    - Daniel Lakeland on June 29, 2017 4:38 PM at 4:38 pm said:
      
      This is the problem with something like ESP. failure of the test of no effect simply leads us to believe that there’s probably something wrong with our test, since in every practical experiment, we know of so many ways that there can be something wrong.
      
      Imagine this set up: a person sits in a sealed soundproof room in Italy and flips coins. After every coin flip, he presses a button, and a picture of the coin is taken, and a signal is sent over the internet to a person in a sealed soundproof room in North Dakota, where a light lights up. Every time the light lights up the subject presses one of two buttons, heads, or tails to telepathically predict what happened in Italy.
      
      Now, after 400,000 tests, suppose you detect an effect? It seems very likely that this is due to dirt on the contacts of the switches or denial of service attacks on the internet, or computer code bugs causing certain pictures of the coin to get accidentally overwritten, or …. blablalba
    - Ben Prytherch on June 29, 2017 5:49 PM at 5:49 pm said:
      
      I don’t understand how these things could show up as an “effect” if your null is H0: p(correct prediction) = 0.5. How would dirt on the contacts of switches or DoS attacks create an imbalance in correct vs. incorrect predictions?
      
      Generally, if we have an experiment in which subjects are randomly assigned to groups, and we are testing for an effect that does not exist, wouldn’t imperfections in the frequency distribution model have to apply to different groups differently in order for the p-value to go to zero as n increases? And doesn’t random assignment preclude this? There are still other problems that could create an apparent effect, such as lack of blinding or demand effects, or maybe in your example a computer bug causes the system to falsely identify a match when really the person in ND picked the wrong outcome. But given a good experimental design, random assignment, and a true null, I don’t see how huge sample sizes are guaranteed to result in small p-values even when distributional assumptions are violated.
    - Daniel Lakeland on June 29, 2017 6:34 PM at 6:34 pm said:
      
      How indeed, but do you think it’s inconceivable that any of the 345,000 different mechanisms I can think of over the next decade did occur?
      
      The point is just that when there are many many ways an individual measurement could go wrong, and you measure hundreds of thousands or millions of individual measurements…. eventually a bunch of these measurement corruptions will occur, and they’ll violate your distribution assumptions. With enough data you’ll detect *the corruptions* but it doesn’t mean you’re detecting ESP or whatever tiny effect you’re trying to detect. It takes (much) more than proving that a null hypothesis of p = 0.5 is incorrect to prove that ESP *is* correct.
    - Carlos Ungil on June 29, 2017 5:11 PM at 5:11 pm said:
      
      A randomization test solves that, if I understand your concern.
    - Daniel Lakeland on June 29, 2017 5:27 PM at 5:27 pm said:
      
      1/Tfinal * integrate(sin(omega*t),t,0,Tfinal) goes to zero when Tfinal goes to infinity. In this sense, sampling infinitely long will result in a mean 0 random variable with a not quite normal distribution.
      
      But, for any finite sample, it will exhibit some deviation from 0 which is deterministic. And the deviation is related to omega*Tfinal, and since we haven’t specified omega, it’s always possible if you tell me a Tfinal for me to choose something like Omega = 2*pi*(3/4)/Tfinal, as an example of something that might happen which will bias your result away from zero.
      
      I don’t think a randomization test can get away from this. Your finite sample is biased by a deterministic function which in your sample is asymmetric around zero, but in the infinite limit isn’t. If you assume exactly zero mean… you will detect that it’s violated even though in infinite limit… it isn’t.
    - Carlos Ungil on June 29, 2017 6:41 PM at 6:41 pm said:
      
      I’m talking about using a model free analysis, where you check if the results for the gifted subjects are different from those obtained by the control subjects using a permutation test. How does your noise distinguish one group from the other?
    - Daniel Lakeland on June 29, 2017 6:59 PM at 6:59 pm said:
      
      I’m assuming we’re just trying to detect if there are subjects who are “gifted” not that we are trying to detect whether the person who labeled them for us was correct.
    - crh on June 29, 2017 4:16 PM at 4:16 pm said:
      
      You could also get p < 0.05 because there is some systemic bias in your measurements. If n is huge and the true effect size is tiny (or zero) then even a very subtle bias could trigger an (ahem) 'false positive'.
- Stephen Martin on June 29, 2017 12:45 PM at 12:45 pm said:
  
  You cannot ensure the null is exactly true though. There could be SOME very, very tiny effect of the manipulation.
  
  People look at me like I’m crazy when I use this as an example, but this is a decent example I think.
  
  The sun affects what URL you visit in a browser. You’d specify the null is that there’s no correlation; r = 0. Because it’s “ridiculous”, the sun does not affect what page you visit.
  This is not true though. The solar radiation can influence your computer. At any given point, you have a VERY VERY small probability of experiencing ‘bit flipping’, which can turn google.com to googlb.com upon transmission; so you may type google.com, but actually visit googlb.com without realizing it.
  
  This effect is extremely tiny, but across millions of replications (i.e., people visiting URL across the world, all the time, every day), people are bound to wind up at the incorrect page. Hackers know this, and will monetize sites like googlb.com, gougle.com, etc. These aren’t for typos, these are for bit flips which are bound to happen.
  
  With that said, there are very few instances where I could buy that the null is actually feasibly true; for instance, ESP as a real effect would run counter to known laws of physics and biology; because of these hard constraints of our physical system, the null could actually be true.
  
  But in social science? No; people are way too flexible. The null can almost certainly be considered false to some millionth of a decimal.
  
  Reply ↓
- Martha (Smith) on June 29, 2017 5:48 PM at 5:48 pm said:
  
  Jon said, “However, in experimental psychology and other experimental sciences, we sometimes go to great lengths to make sure that the null hypothesis is exactly true if an effect isn’t there.”
  
  Can you give examples?
  
  Reply ↓
  - AnonAnon on June 29, 2017 7:02 PM at 7:02 pm said:
    
    I can’t speak for what Jon means but there is some pretty nifty stuff being done in https://en.wikipedia.org/wiki/Psychophysics which gives rise to research like this:
    
    Gunnar Borg and Elisabet Borg. 1991. A General Psychophysical Scale of Blackness and Its Possibilities as a Test of Rating Behaviour. Department of
    Psychology, Stockholm University.
    
    which can be used to do things like this: https://www.autodeskresearch.com/publications/effect-visual-appearance-performance-continuous-sliders-and-visual-analogue-scales
    
    /threadjacking
    
    Reply ↓
    - Martha (Smith) on June 29, 2017 10:20 PM at 10:20 pm said:
      
      I don’t think this really answers my question, but it is interesting.
  - Jon Baron on July 1, 2017 11:07 AM at 11:07 am said:
    
    So I was thinking about this and I did think of some examples from my own research. There are many, but here are two:
    
    Baron, J., & Thurston, I. (1973). An analysis of the word-superiority effect. Cognitive Psychology, 4, 207–228.
    
    This one is interesting, because the control condition is “imperfect”. The matching with the experimental condition is rough. With thousands more subjects and/or stimuli, the null would have been rejected, but not because the null hypothesis was false but rather because the control condition wasn’t quite right. (I’m thinking of the difference between familiar words and “equally pronounceable” nonwords, such as “work” and “sork”. The paper argued that there was no difference, but both of these were perceived better than unpronounceable nonwords such as osrk. The latter result was no big deal.)
    
    Baron, J. (1974). Facilitation of perception by spelling constraints. Canadian Journal of Psychology, 28, 37–50.
    
    This was the best controlled study I’ve ever done: completely counterbalanced, perfectly controlled. If subjects had experienced AB and CD, they were better at perceiving the letters (in a forced choice between two letters) than if they had experienced AB, CD, AD, and CB, with each pair of letters presented the same number of times (so that frequency of exposure to the individual pairs was held constant). I didn’t think it would work, but it did. There was a difference.
    
    Of course there are thousands of others, by me and other people.
    
    Reply ↓
    - Martha (Smith) on July 1, 2017 4:36 PM at 4:36 pm said:
      
      Jon,
      I haven’t looked at the papers, but I don’t how the discussion you have given in this latest post addresses the point “we sometimes go to great lengths to make sure that the null hypothesis is exactly true if an effect isn’t there.” (Possibly you need to clarify what you mean by the quoted sentence?)
    - Jon Baron on July 2, 2017 3:18 PM at 3:18 pm said:
      
      By “great lengths” I just mean “taking care to make sure everything is held constant except the manipulation of interest”. This paper makes a point:
      http://www.sas.upenn.edu/~baron/papers/outcomebias.pdf
      We found that people judge the quality of a decision by its outcome, even when they know everything that the decision maker knew, and that knowledge is held constant. What is interesting here was that others had claimed before that outcome mattered, but they had not taken the kind of care that we took to hold everything else constant. This “taking care” is not routine.
      
      (Another interesting thing about this paper was when I described the result to an even-more-senior researcher than I am, one I highly respect. He said, “Oh. That’s nice. I tried that around 15 years ago, but it didn’t work, so I gave up.” I’m not sure whether he thought the null hypothesis was true or that he had just failed to do the right experiment.)
- Christian Hennig on June 29, 2017 8:09 PM at 8:09 pm said:
  
  Your null is a normal distribution with some parameters, you can only observe a finite number of digits after the decimal point of your measurement => your null is technically false. If you seriously try to make the null exactly true, no chance.
  
  Reply ↓
Adam Pegler on June 29, 2017 12:16 PM at 12:16 pm said:

I agree. Ioannidisian philosophy of scientific results, as I would like to call it, draws very heavily on NHST. It’s mathematical framing is NHST.

It says

(a) that a true result is when the null hypothesis is rejected and the null hypothesis is indeed false (the effect size was not 0 in the study and is not 0 in the population). p was < .05 (whatever alpha level the research chooses) and the null is false.

Now if,

(b) the null hypothesis is almost always false (contentious, but an assumption held by many)

Then if follows that,

(c) a rejection of the null hypothesis result is almost always correct.

so

(d) most research findings are indeed "true", not "false"

I'm confused

Reply ↓
- Chris Wilson on July 1, 2017 8:03 PM at 8:03 pm said:
  
  No! They could very well be in the wrong direction (type S error), or highly exaggerated (type M error)!
  
  Reply ↓
Rajesh on June 29, 2017 12:32 PM at 12:32 pm said:

Andrew,

…_statistical claims are only loosely linked to scientific hypotheses_…

This point is not emphasized enough in introductory classes. I was wondering if there have been attempts at mathematical formalization of the differences between statistical truths and scientific truths in the philosophy of statistics literature. I often try to emphasize these issues in discussions and in classes but both peers and students alike have a hard time appreciating these differences.

Is there any quick-go-to literature on this issue?

Reply ↓
- Anoneuoid on June 29, 2017 1:14 PM at 1:14 pm said:
  
  Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of
  Science, 34, 103-115. http://www.fisme.science.uu.nl/staff/christianb/downloads/meehl1967.pdf
  
  Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of
  soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806.
  
  Meehl, P. E. (1990). Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles That Warrant It. Psychological Inquiry, Vol. 1, No. 2, 108-141. https://pdfs.semanticscholar.org/2a38/1d2b9ae7e7905a907ad42ab3b7e2d3480423.pdf
  
  Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence
  intervals and quantify accuracy of risky numerical predictions. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests (pp. 395-425). Mahwah, NJ: Erlbaum.
  
  Maybe figure 2 of Meehl 1990 and page 398 of Meehl 1997 are most likely to catch your interest as working towards a mathematical formalization. All those papers cover this topic though.
  
  Reply ↓
  - Rajesh on June 29, 2017 3:38 PM at 3:38 pm said:
    
    Thank you very much for this list.
    
    Reply ↓
  - Noah Motion on June 30, 2017 9:29 AM at 9:29 am said:
    
    I second the “thanks.”
    
    Also, here’s a pdf of the second one, and here’s a pdf of the fourth one.
    
    Reply ↓
- awm on June 29, 2017 4:24 PM at 4:24 pm said:
  
  There is a really nice paper by Robert Kass that takes pragmatism as a school of thought in the philosophy of science and epistemology and relates it to statistics, making a distinction between the conceptual realm of models and the actual realm of data. I think more work along these lines would be really great, and if anyone knows of anything I’d love some additional references.
  
  http://www.stat.cmu.edu/~kass/papers/bigpic.pdf
  
  I think the idea that statistical models are neither true nor false but useful is reasonably well accepted (at least the quote by Box is well known), but I don’t think it’s quite penetrated in a lot of places that all statistical estimates rest on some sort of model, even when it’s not explicit. This is particularly an issue in my own field of survey research.
  
  Reply ↓
  - ojm on June 29, 2017 5:37 PM at 5:37 pm said:
    
    Might not quite be what you’re after, and excuse the promotion, but Laurie Davies’ book ‘Data Analysis and Approximate Models’ is summarised:
    
    “The First Detailed Account of Statistical Analysis That Treats Models as Approximations.
    
    The idea of truth plays a role in both Bayesian and frequentist statistics. The Bayesian concept of coherence is based on the fact that two different models or parameter values cannot both be true. Frequentist statistics is formulated as the problem of estimating the “true but unknown” parameter value that generated the data.
    
    Forgoing any concept of truth, Data Analysis and Approximate Models: Model Choice, Location-Scale, Analysis of Variance, Nonparametric Regression and Image Analysis presents statistical analysis/inference based on approximate models. Developed by the author, this approach consistently treats models as approximations to data, not to some underlying truth.”
    
    See: https://www.crcpress.com/Data-Analysis-and-Approximate-Models-Model-Choice-Location-Scale-Analysis/Davies/p/book/9781482215861
    
    Reply ↓
    - ojm on June 29, 2017 5:46 PM at 5:46 pm said:
      
      (Besides coherence, the Cox-Jaynes approach also builds on a Boolean logic in which a proposition is true or false and it’s negation the opposite. If you don’t want models to be true or false then they’re not propositions and they’re not subject to probability statements in the Cox-Jaynes sense.
      
      One thing I don’t understand is how Andrew can say he doesn’t like talking about the probability of a model but will apply probability to parameters – don’t parameters index models? Applying probability to parameters seems to assume only one value can be correct, ie an identifiability assumption).
    - Daniel Lakeland on June 29, 2017 6:40 PM at 6:40 pm said:
      
      I think it’s the difference between conditional probability and “unconditional” probability, whatever that is. My read on this is Andrew is happy to first assume that a model could be correct and then give probabilities over parameters *given that assumption*. But he’s on record being not so sure that picking a small number of models and assigning probabilities to them individually is meaningful. I’m not as skeptical… I think it’s all very very useful, even though I’m happy to agree with you at a fundamental level that “the true value of the parameter” probably doesn’t exist, etc.
      
      I think of the high posterior probability region of parameter space as “parameter values that are most consistent with the data and the prior plausibility” not really “the location where the true parameter lies”. Taking your model too seriously is I think a scientific fault, not a mathematical one.
    - ojm on June 29, 2017 7:00 PM at 7:00 pm said:
      
      But there is a gap between your informal intuition and the Cox-Jaynes axioms you are fond of mentioning. That’s what my comment is about, and I think what Andrew acknowledges. You can’t have it both ways.
      
      If you drop that mathematical assumption that conflicts with intuition then the Cox-Jaynes argument no longer works and other approaches seem more reasonable. And I think it is a practical issue because non-identifiability is the typical case not the exception in real problems.
    - Daniel Lakeland on June 29, 2017 7:23 PM at 7:23 pm said:
      
      I don’t think the issue you are raising is any different from the issue that there is no guaranteed correct way to make connections between real-world objects and set theoretic objects. Mathematics is always simply an approximation to reality. This is true for ALL of mathematics.
      
      The advantage of Cox axioms is not that they are the correct way to do real world inference really. It’s that they are fully consistent as a system and therefore do not produce additional avoidable internal contradictions.
    - Daniel Lakeland on June 29, 2017 7:26 PM at 7:26 pm said:
      
      But I think you are right to be very cautious about the idea that proving some mathematical fact does not guarantee that you have proven a real world fact.
    - Daniel Lakeland on June 29, 2017 7:42 PM at 7:42 pm said:
      
      1) Consider real world problem
      2) Build description of real world problem into set theoretic objects (math).
      3) Do some mathematical manipulations to produce new mathematical statements.
      4) Interpret (3) back into new real world hypotheses
      
      The advantage of Cox axioms is that step (3) has been mathematically proven to give unique internally consistent answers that agree with the desirable properties expressed in the axioms. The proof involves first proving the Cox uniqueness results (that there is only one mathematical system that is consistent with the axioms), and then exhibiting Kolmogorov probability theory as a model of the set theoretic construct, and then invoking the proof that a set theoretical object is consistent if and only if it has a model.
      
      I don’t think Cox gives us anything other than a consistent (3) which satisfies the desirable properties in the axioms. I accept that they’re desirable.
      
      But when it comes to science, I still want to check step 2 and step 4 to see if I’ve done something “wrong” in the sense of failed to make a good correspondence between the world, and the set-theoretic stuff.
      
      I consider that “outside” the Bayesian framework… It’s the “art” of mathematical modeling in science, the part that can’t be axiomatized.
    - ojm on June 29, 2017 7:43 PM at 7:43 pm said:
      
      The formal link is broken as soon as multiple parameters can be equally consistent with an arbitrary amount of data. For example if any combination of k1+k2 maps to the same observation model.
      
      What you call conditional probability is, I think, and indexed family of probability distributions. Index by ‘background’. Then you get a single probability distribution given a fixed background. The quantities that this is a distribution over are now assumed to be mutually exclusive possibilities. A prior ‘conditioned’ on background p(theta | B) says that different values of theta can’t be equally true. If one becomes more plausible, the rest must become less.
    - Daniel Lakeland on June 29, 2017 7:43 PM at 7:43 pm said:
      
      picking a small number of models and assigning them probabilities seems to me to fall firmly into step (2) and so it’s fair to say *outside* of Bayes that it’s a poor way to do science… without saying that therefore Bayes must be the wrong way to do step (3).
    - Daniel Lakeland on June 29, 2017 7:48 PM at 7:48 pm said:
      
      Sure if one becomes more plausible the others must become less… but I don’t see why the limit has to be p(X) = 1, why can’t it be p(X or Y) = 1 or the like? The limiting truth statements don’t have to be atomic, like “Sweet Georgia Brown is the best Jazz standard ever written” why can’t they be statements like “All of the songs in the set {X_i} are equally good, and better than any other songs” or whatever.
      
      If Bayes gives you “the values X1,X2,X3 are all equally good” and you feel that there scientifically needs to be a SINGLE value that is best, then this doesn’t indicate a problem with my step (3) it indicates a problem with step (2).
    - Daniel Lakeland on June 29, 2017 8:00 PM at 8:00 pm said:
      
      To give some kind of simple version of this. Suppose you do some QM measurement and it comes back “either the spin of the electron is up, or is down, both equally likely” are you saying this is an invalid system of inference, because it can’t distinguish between the two spins? And this is a mathematical fault of Bayes, even though say the laws of physics guarantee that whatever the experiment you did is, the results of the experiment are symmetric with respect to spin? I consider that a feature not a bug. You didn’t measure anything that is affected by spin, so the system refuses to tell you that one spin is more probable than the other.
    - ojm on June 29, 2017 8:11 PM at 8:11 pm said:
      
      Daniel – I’ve tried to make my point as best I can.
      
      Side point re falsificationist Bayes. This deals with the issue by retreating to a weaker logic at the ‘outermost’ level, but then I still worry about similar issues ‘inside’ the model. Eg nonidentifiabilty can manifest here just as much as the outer level right?
    - Daniel Lakeland on June 29, 2017 8:14 PM at 8:14 pm said:
      
      I think where you go wrong is “The quantities that this is a distribution over are now assumed to be mutually exclusive possibilities.”
      
      That’s a very plausible assumption for many scientific models, but it’s also perfectly reasonable, for example the spin up/down scenario, for two or more different parameter values to have equal scientific (ie. outside bayes) consistency.
      
      Sometimes in step (2) we want a quantity to eventually wind up being a unique thing… step (3) (Bayesian inference) then tells us it isn’t true within the model. We interpret this in step (4) as “something is scientifically wrong with our model… back to step 2”
      
      That inference about models can be partially axiomatized when we do it explicitly within Bayes, but I don’t think you can axiomatize all of science. Step (2) is needed as a distinct step… choosing a scientific model can’t be axiomatized. If it could, we’d be doing Math not science.
    - Daniel Lakeland on June 29, 2017 8:18 PM at 8:18 pm said:
      
      “Eg nonidentifiabilty can manifest here just as much as the outer level right?”
      
      Inside a Bayesian model, you can easily get non-identifiability. Whether this is a scientific problem is external to Bayes. If at the “outer” level you think “there can be only one… parameter value” then when Bayes says “either of these is true” you have two choices: re-evaluate whether it’s scientifically true that there can be only one…. or design an experiment that collects additional data to inform you about which is true.
      
      If, in principle, you can’t design an experiment to get your Bayesian model the needed information…. then you need to re-evaluate the scientific assumptions that led to the Bayesian model. Either the model is wrong, or the scientific assumption of uniqueness is wrong.
    - Daniel Lakeland on June 29, 2017 8:24 PM at 8:24 pm said:
      
      ojm: i should say that your determination in this matter has been very helpful though. I now acknowledge explicitly that you NEED something outside of Bayes. There has to be two levels at least. I think before having these discussions with you, this was an implicit assumption that I just accepted without thinking about it.
      
      I am pretty sure that this is strongly related to Godel’s theorem. There are always going to be truth statements within the Bayesian model that we can’t evaluate inside Bayes whether they are “true” in the scientific sense.
    - ojm on June 29, 2017 8:30 PM at 8:30 pm said:
      
      Daniel – to clarify, I don’t think non-identifiability is bad, I think it is normal. I think Bayes handles it in a bad way by assuming there has to be a truth of the matter.
      
      If two possibilities are equally consistent with given observational data then I can report that ‘these two possibilities are equally consistent with the data’. This is not a probability distribution. If it was, I would have to report ‘there is a 50% chance that p1 is the true value and 50% chance that p2 is the true value’ since probability theory is built on Boolean logic.
      
      If I now add a third, equally consistent, value I can report either ‘these three values are equally consistent’ or ‘there is probability 1/3, 1/3, 1/3 of p1, p2, p3’. Notice the former is stable wrt introducing another equally consistent possibility while the latter is not. The former reasoning is possibilistic, the latter probabilistic.
    - Daniel Lakeland on June 29, 2017 9:03 PM at 9:03 pm said:
      
      Hmm. I think this is in your interpretation of the “meaning” (outside Bayes) of 1/3, 1/3, 1/3 probability, formally *within* Bayes, these each just get numbers assigning plausibility measures, so interpreting it as “all these have equal plausibility” is entirely consistent with the numbers.
    - Daniel Lakeland on June 29, 2017 9:34 PM at 9:34 pm said:
      
      Notice to make this explicit, that when you add a 3rd possible value to the parameter so that you get 1/3,1/3,1/3 you change the model. You pass through “step (2)” again. It may feel like these are the same model with just another possibility, but they’re formally different.
    - ojm on June 29, 2017 9:42 PM at 9:42 pm said:
      
      Yes you change the model family when you add new possibilities. But then this is very unstable – I have to change all of my plausibility weights every time I add an equally consistent model to the set under consideration, even though each old model may stay equally consistent with the data.
      
      This makes sense if they are mutually exclusive and have to share the same probability pie, but not so much if you think there are always arbitrarily many models equally consistent with given observations.
    - ojm on June 29, 2017 9:50 PM at 9:50 pm said:
      
      Note the likelihoods, say p(y0 | theta) are unchanged but the priors p(theta | B) have to be changed constantly. This is because in the former theta appears on the right (and can in fact be considered an indexing variable rather than a conditioning variable) while it the latter it occurs on the left and is hence subject to the probability calculus.
      
      I’m far from the only one to make such points btw. Laurie has made similar points, Fisher did too, etc.
    - Daniel Lakeland on June 29, 2017 10:41 PM at 10:41 pm said:
      
      If I write a computer program in C++ and I decide I need to add a variable to do a calculation, I have to assign a value to it before I can begin using it… I don’t see it as a failure of a system of inference that you have to tell it something about what you want it to do inference on before you can begin your calculations.
      
      I think the confusion between us is that I believe your complaints are external to Bayes, and you believe they are essential aspects. Or perhaps you are unhappy that the application of Bayes requires these external aspects, that you’d like to have some of them within the formal system. I’m pretty sure Godel’s result says there will always be an outer layer of the onion, but that doesn’t mean we can’t formalize some of that outer layer and push back a different fraction to the next outer layer. Maybe what you need is a formalization of the external shell which adds some additional formal rules that allow you to make additional things consistent about the “sciencey” part, model selection for example. You can do Bayesian model selection within Bayes through mixture models, but you could also do model selection outside Bayes using one-at-a-time comparison of a model to some “desirable model properties”. I don’t think there’s anything necessarily logically inconsistent about wanting to do that part outside Bayes.
    - ojm on June 29, 2017 11:26 PM at 11:26 pm said:
      
      In the analogy, not only would you need to assign a value for that variable, you would also have to update the values of all other variables since they satisfy a joint constraint.
    - Daniel Lakeland on June 30, 2017 3:04 AM at 3:04 am said:
      
      That concern sounds related to normalization of probability distributions, but as a computational issue that has largely been solved through MCMC which doesn’t require normalized pdfs. It’s a little as if the computer language handles the renormalization of your variables automatically. It has to be done, but not necessarily by you.
      
      But I don’t quite think that’s your concern. It seems more like you’re concerned about the “space of all models” and model selection issues? I never quite can figure out what the concern is.
      
      Within the context of a model, Bayes does a well defined thing. And selecting between N distinct models via mixture modeling with weight parameters, it can be a computational challenge, but the logic seems fine.
      
      Choosing and considering from among the nebulous “all the models I might want to think about eventually related to scientific question X” is I think a real issue, but it’s an issue I don’t think needs to be solved by Bayes, any more than we need a formalized system for “generating and choosing between all paintings, sculptures, novels, poems, or music that will ever be of interest to humans” or something like that. For many models there’s a purpose, and a human sense of “good enough” and so “finding the truth” isn’t really even a goal. The truth for say the effect of subsidizing mass transit on children’s educational attainment in rural Arkansas is given by a massive Lagrangian and initial conditions for 10^45 atoms… or whatever.
    - Daniel Lakeland on June 30, 2017 12:12 PM at 12:12 pm said:
      
      ojm: thinking about this a bit, I see that in some sense your concern is about infinities. Suppose there are countably infinitely many models of phenomenon x. Then, you can’t place a probability distribution over them that gives at least some fixed finite probability to each one. In other words, there is no uniform distribution over a countably infinite set of model possibilities.
      
      As you know, I’m a big fan of IST nonstandard analysis. So in some sense, my preferred way of describing this scenario is, to suppose that there is a nonstandard but finite integer number of models, and we place at least some infinitesimal probability on each one. So suppose there are N models, and we put at least 1/N^2 probability on each one (some obviously have some appreciable amount of prior probability, others just infinitesimal amounts).
      
      We can then take the standardization of the probability distribution, and we’ll get something, supposing that there’s an ordering of the integers such that the probability distribution is continuously decreasing, we’ll get something that decays sufficiently rapidly as n goes to N.
      
      Now suppose you want to work with model selection *within* Bayes, as opposed to outside Bayes. Suppose there are N model structures you want to consider, and you have a dirichlet distribution over N dimensional vectors that describes your mixture model. I think this is the same thing as a dirichlet process mixture model but I don’t have a proof. In any case, it’s a pretty obviously sane nonstandard mixture model provided that the likelihood remains limited for all models.
      
      Is there really a concern that this IST nonstandard formalized version of things is insufficient to describe real science?
      
      One of the reasons I love IST is that it formalizes the intuition in applied problems that everything is really actually finite, and infinity is just an approximation we use when we can’t a-priori give a particular bound.
    - Christian Hennig on June 29, 2017 8:25 PM at 8:25 pm said:
      
      ojm:
      “One thing I don’t understand is how Andrew can say he doesn’t like talking about the probability of a model but will apply probability to parameters – don’t parameters index models?”
      
      I can’t speak for Andrew of course but to me models are idealised thought constructs that we use to view reality “through them”. You can set up an artificial mechanism distributing parameters and then observations given parameters, and give it some interpretation that connects this to what you think are the real mechanisms (which can of course be discussed and revised). Then you can run the Bayesian machinery on your data, which will redistribute the probability weight between parameters, showing you what you learn from the data in the given model framework. Technically, of course, this gives you probabilities for parameters and therefore probabilities for models (in Davies’ terminology, which I like). But at no point do you claim anything like a model defined by any of your parameters being literally true.
      
      What *is* correct is that implicitly the Bayesian model framework assumes that there’s only one “true” model (“true” here not in a realist but in a technical sense), but I’m fine with this as a thought construct; the claim is not that this one model is “really” or even only “approximately really” true.
    - ojm on June 29, 2017 8:43 PM at 8:43 pm said:
      
      Hi Christian,
      
      The thing I dislike about ‘truth’ is not so much the metaphysical side, but that this means a given model/parameter has to balance its probability weight against its negation, which is usually an infinite set for continuous models.
      
      I think the natural logic of continuous models should be more ‘topological’ where ‘closed’ does not mean ‘not open’ etc.
    - ojm on June 29, 2017 8:45 PM at 8:45 pm said:
      
      Technically speaking, building on something like a topos or Heyting algebra instead of a Boolean algebra.
    - ojm on June 29, 2017 8:48 PM at 8:48 pm said:
      
      Blocking double negation = true is also a great, if technical way to avoid falsification of a null = true theory!
    - Christian Hennig on June 29, 2017 8:51 PM at 8:51 pm said:
      
      Fair enough; at the moment to me this doesn’t seem to be such a big issue but I’m curious to see elaborated what you propose.
    - Anonymous on June 29, 2017 9:36 PM at 9:36 pm said:
      
      “What *is* correct is that implicitly the Bayesian model framework assumes that there’s only one “true” model”
      
      This is incorrect. While the typical scenario considered by Bayesians involves a collection of mutually exclusive and exhaustive models, only one if which is true, the formalism doesn’t require it. The relevant generalization needed of the sum rule of probability theory which relaxes this requirement can be found here:
      
      https://en.m.wikipedia.org/wiki/Inclusion–exclusion_principle
    - ojm on June 29, 2017 11:44 PM at 11:44 pm said:
      
      Can you elaborate?
    - ojm on June 29, 2017 11:51 PM at 11:51 pm said:
      
      For example, using your proposed generalisation do probability distributions still integrate to one?
    - Daniel Lakeland on June 30, 2017 12:33 PM at 12:33 pm said:
      
      I think the point here is that if there are N models, and the concern is that in some sense both model A and model B can be true, then the probability calculation associated with P(A or B) which maintains conservation of probability (ie. sum to 1) is p(A) + P(B) – P(A and B) and generalizations thereof.
      
      So, yes, probability distributions still sum to 1, that’s the goal, but we relax the constraint sum(P(X_i),i=1,N) = 1 into something that accounts for the possibility of overlap.
    - Keith O'Rourke on June 30, 2017 10:09 AM at 10:09 am said:
      
      ojm:
      
      The “models are idealised thought constructs” is a mere possibility that is used to represent actual existence – so it should not be taken to have to balance its probability weight against its negation?
      
      Drawing from the earlier paper on Peirce we discussed by email- “A continuous line contains no points or we must say that the principle of excluded middle does not hold of these points. The principle of excluded middle only applies to an individual. . . But places being mere possible, without actual existence, are not individuals. Hence a point or indivisible place really does not exist unless there actually be something there to mark it, which, if there is, interrupts the continuity. (MS 1597, CP 6.168, 1903 Sep 18)”
    - ojm on June 30, 2017 8:02 PM at 8:02 pm said:
      
      “so it should not be taken to have to balance its probability weight against its negation?”
      
      Yes, I agree – but standard Bayes does require this. Eg a prior over a set of models or parameters requires a sum to one, right?
    - Keith O'Rourke on July 1, 2017 9:58 AM at 9:58 am said:
      
      OK but all models are wrong and that a prior over a set of models or parameters does really sum to one is definitely wrong – guess its unclear how to get a less wrong model in this regard?
    - Daniel Lakeland on July 1, 2017 6:36 PM at 6:36 pm said:
      
      Keith: but in what sense is it wrong? I say the sense is in the real-world outside of the model. Like Godel’s theorem where every formal system can construct a sentence which is true but impossible to prove within the system, every fixed model (or finite mixture model of several models) of a complex realistic scientific study is very likely to exclude something that we might later want, for scientific / external reason, to consider within a Bayesian framework.
      
      The probabilities that Bayes assigns shouldn’t be taken as information that is final at the “outer” level of model choice and model adequacy, and soforth, they should be taken as what they are, the relative plausibility measures of the options you did consider.
      
      In some sense, that’s the difference between Jaynes’ robot, aka Stan, and the scientist who is programming it.
    - ojm on July 1, 2017 8:07 PM at 8:07 pm said:
      
      Keith – if you take Bayes as a model for the inference process itself, then it is essentially a model of an agent who is trying to allocate weights among possibilities, subject to a sum constraint, to decide which possibility is true.
      
      There are alternative models of the same sort of inference problem. Some are explicitly this but don’t seem very popular eg possibility theory, Dempster-Schafer theory etc. Some are implicitly this (eg likelihood provides a semantics for possibility theory) and quite popular but used in different ways, and some approaches (eg data analysis, neural networks maybe) seem to change the question entirely.
      
      So I think there are numerous options out there if you look for them. Personally I have become less enthusiastic about Bayes when I found myself doing things in real problems that weren’t easy to formalise in the Bayesian framework. I found it intellectually freeing to think about alternatives rather than trying to squeeze everything in to one approach. Having said that, there are benefits to knowing one thing well and formulating everything as that.
    - Chris Wilson on July 2, 2017 8:32 AM at 8:32 am said:
      
      ojm I’m interested in the practical applications where you have found possibility theory more useful than Bayes. For me at least a concrete example is worth a thousand words..:)
    - ojm on July 2, 2017 3:43 PM at 3:43 pm said:
      
      Chris – take a simple nonidentifiable ode system to which you know the analytical solution. Carry out inference (using eg synthetic data) via a) Bayes using MCMC and b) profile likelihood. Compare the ‘concentrated’ inferences for individual parameters obtained via marginalisation and profiling respectively.
    - Chris Wilson on July 2, 2017 8:38 PM at 8:38 pm said:
      
      OK I have a model in mind that I’ve worked with a bit. What software/package do you recommend for profile likelihood? If you have some clear cut case studies, why not publish them in your blog?
    - Corey on July 2, 2017 9:32 PM at 9:32 pm said:
      
      chris, ojm, can you tell me of or point me to such an ODE? I haven’t used ODEs for modelling myself; I want to try this out and I don’t want to be barking up the wrong tree. ojm, I assume we’re using normal errors on the observations?
      
      I have to say that as I thought through the issue I became a bit baffled. My understanding is that by definition, a model that is nonidentifiable has alikelihood function with a ridge (or greater-than-zero-dimensional manifold) maximum that extends throughout parameter space. Unless the projection of the parameter space down to a lower dimension via profiling happens along a direction aligned with the ridge, the profile likelihood is going to be flat. ojm, have I misunderstood something about the situation, are you using a different definition of “nonidentifiable”, or…?
    - Chris Wilson on July 2, 2017 9:59 PM at 9:59 pm said:
      
      I’m also not entirely clear about the “non identifiable” criterion ojm has in mind. I have a pet model that suffers from what I might call “partial non-identifiability”, in that I have tended to require very strong priors on one parameter in order to accurately estimate the other (or worse, to avoid bi-modality). I am curious what profile likelihood would do, assuming we could get a stable estimate for anything. But I confess that I know jack about profile likelihood, and have never tried to fit anything that isn’t a basic GLM using likelihood methods…
    - ojm on July 2, 2017 10:28 PM at 10:28 pm said:
      
      > What software/package do you recommend for profile likelihood?
      
      For a simple example, anything that can solve ODEs, do for loops and optimisation. For example, Python (using e.g. scipy.optimize), Matlab, R.
      
      Just compute
      
      Lp(theta1) = max_(theta2) (L(theta1,theta2))
      
      etc over grids.
      
      > If you have some clear cut case studies, why not publish them in your blog?
      
      I’m lazy, for one. Maybe one day though.
      
      > the profile likelihood is going to be flat.
      
      Yup.
      
      Unlike a flat probability distribution, a flat likelihood represents no information quite well.
      
      > can you tell me of or point me to such an ODE?
      
      Maybe a simple chemical kinetics model?
      
      Like
      xdot = k1 – k2*x
      ydot = k3*(k1-k2*x)
      
      though this one is quite contrived (I’m sure you can do better). And yup, just assume say additive Gaussian error.
    - ojm on July 2, 2017 10:41 PM at 10:41 pm said:
      
      (where only y is observed…)
    - ojm on July 3, 2017 2:37 AM at 2:37 am said:
      
      PS Chris, feel free to send me your example system. I’d be interested to take a look at what sort of problems you have in mind.
    - Corey on July 3, 2017 12:22 PM at 12:22 pm said:
      
      ojm, Mike Evan’s program of measuring statistical evidence by relative belief addresses this problem in a Bayesian framework.
    - ojm on July 3, 2017 3:11 PM at 3:11 pm said:
      
      Corey – I read Evans’ book and like a lot of aspects his work. I definitely use elements of it when wearing my Bayesian hat but t’s not quite enough to convince me to keep the hat on consistently.
    - Andrew on June 29, 2017 9:56 PM at 9:56 pm said:
      
      Ojm:
      
      You write, “One thing I don’t understand is how Andrew can say he doesn’t like talking about the probability of a model but will apply probability to parameters – don’t parameters index models?”
      
      See the discussion on p.76 of this paper from 2011.
    - Martha (Smith) on June 29, 2017 10:26 PM at 10:26 pm said:
      
      Also in response to Ojm’s question:
      
      The way I see it is that there are two aspects of a specific model: The form of the model, and the specifications of the parameters. So it makes sense to talk about the probability of a parameter given the form of a model.
    - Daniel Lakeland on June 29, 2017 10:44 PM at 10:44 pm said:
      
      Yes, I think that “choose a form of the model, and some quantities associated to the objects inside it” is my (2) in my comment linked below:
      
      http://statmodeling.stat.columbia.edu/2017/06/29/lets-stop-talking-published-research-findings-true-false/#comment-517091
    - ojm on June 29, 2017 11:15 PM at 11:15 pm said:
      
      Martha: see my comments on families of probability models (families of model instances) and identifiability. I am explicitly aware of and even referring to what you are getting at, but this doesn’t answer my point.
    - ojm on June 29, 2017 11:36 PM at 11:36 pm said:
      
      Andrew: yes, nice article and nice discussion. I appreciate that you are keenly aware of the tensions within your own approach, as we should all be. I suppose I’m wondering how to go beyond the tension.
      
      You summarise the issue well when you say
      “This is a silly example but it illustrates a hole in my philosophical foundations: when am I allowed to do normal Bayesian inference about a parameter θ in a model, and when do I consider θ to be indexing a class of models, in which case I consider posterior inference about θ to be an illegitimate bit of induction?”
    - ojm on June 29, 2017 11:39 PM at 11:39 pm said:
      
      I would say something like ‘identifiability’ or ‘I believe only one possibility is true, given enough data’ is a necessary, prob not sufficient condition.
    - Corey on June 29, 2017 7:48 PM at 7:48 pm said:
      
      Davies gets coherence wrong. De Finetti only admitted previsions (i.e., expectations) on observables; in his framework, parameters only arose as devices for summarizing exchangeable predictive distributions. It’s ironic that I, a Jaynesian, am the one to point this out.
    - ojm on June 29, 2017 8:06 PM at 8:06 pm said:
      
      Interesting- will have to ask him about that. I did wonder since de Finetti always seems to restrict probability to observables. Some of DF’s stuff is nice but I do have other issues with it.
    - Christian Hennig on June 29, 2017 8:30 PM at 8:30 pm said:
      
      Corey: Originally my reading of Davies’ argument was that he made reference to the way many Bayesians sell their models and results to their clients and readers. I know this argument of his since 1995 or so and made him at least once aware that this is not in line with de Finetti for the reason that you give. I’m still not quite sure how general he thinks the argument applies to Bayesian analyses.
  - Martha (Smith) on June 29, 2017 5:58 PM at 5:58 pm said:
    
    Yes, the “big picture” paper of Kass is really nice. I’ve often referred to it, using the following summary:
    
    Points this picture is intended to show include:
    • Both statistical and scientific models are abstractions,
    living in the “theoretical” world, as distinguished from
    the “real” world where data lie.
    • Conclusions straddle these two worlds: conclusions
    about the real world typically are indirect, via the
    scientific models.
    • “When we use a statistical model to make a statistical
    inference we implicitly assert that the variation
    exhibited by data is captured reasonably well by the
    statistical model, so that the theoretical world
    corresponds reasonably well to the real world.” (p. 5)
    • Thus “careful consideration of the connection between
    models and data is a core component of … the art of
    statistical practice…” (p. 6)
    
    Reply ↓
  - Andrew on June 29, 2017 9:21 PM at 9:21 pm said:
    
    Awm:
    
    Here’s “Bayesian Statistical Pragmatism,” my discussion of Kass’s paper from 2011.
    
    Reply ↓
Richard D. Morey on June 29, 2017 5:38 PM at 5:38 pm said:

Too good to be true
Too true to be bad

I’m calling dibs on “Too bad to be false,” “Too good to be bad,” and “Too true to be false.”

Reply ↓
- Corey on June 29, 2017 7:53 PM at 7:53 pm said:
  
  I claim “Too X to be ugly” and all “Too ugly to be X” for all X.
  
  Reply ↓
  - Daniel Lakeland on June 29, 2017 10:49 PM at 10:49 pm said:
    
    I’ll take “A fistful of X” and “For a Few X More” for all X
    
    Reply ↓
- Phil on June 30, 2017 7:59 AM at 7:59 am said:
  
  Not a parallel construction, but offered for your consideration nonetheless: I often use a variant of Lewis Carrol’s “Contrariwise, if it was so, it might be; and if it were so, it would be; but as it isn’t, it ain’t. That’s logic.”
  
  I shorten this “if it were so, it would be, but since it isn’t, it ain’t.”
  
  Reply ↓
  - Andrew on June 30, 2017 8:03 AM at 8:03 am said:
    
    Phil:
    
    That’s an appropriate characterization of the last sentence of the abstract of this paper.
    
    Reply ↓
  - Martha (Smith) on June 30, 2017 10:47 PM at 10:47 pm said:
    
    +1 to Lewis Carrol and Phil
    
    Reply ↓
Imaging guy on June 29, 2017 6:44 PM at 6:44 pm said:

Is there anything useful or practical ever come out of social science or psychological research? I am not trolling. I just want to know. Below is what Richard Feynman said in his famous “cargo cult science” speech (Caltech’s 1974 commencement address).

“There are big schools of reading methods and mathematics methods, and so forth, but if you notice, you’ll see the reading scores keep going down–or hardly going up in spite of the fact that we continually use these same people to improve the methods. There’s a witch doctor remedy that doesn’t work. It ought to be looked into; how do they know that their method should work? Another example is how to treat criminals. We obviously have made no progress–lots of theory, but no progress– in decreasing the amount of crime by the method that we use to handle criminals. Yet these things are said to be scientific. We study them. And I think ordinary people with commonsense ideas are intimidated by this pseudoscience. A teacher who has some good idea of how to teach her children to read is forced by the school system to do it some other way–or is even fooled by the school system into thinking that her method is not necessarily a good one. Or a parent of bad boys, after disciplining them in one way or another, feels guilty for the rest of her life because she didn’t do “the right thing,” according to the experts.So we really ought to look into theories that don’t work, and science that isn’t science. I think the educational and psychological studies I mentioned are examples of what I would like to call cargo cult science.”

Reply ↓
- AnonAnon on June 29, 2017 6:51 PM at 6:51 pm said:
  
  See for example: https://en.wikipedia.org/wiki/Prospect_theory
  
  Reply ↓
- jrkrideau on June 29, 2017 8:31 PM at 8:31 pm said:
  
  Behaviourism and its derivative cognitive behaviour therapy?
  Various other flavours of learning theory.
  
  You also want to look up the field of ergonomics or look up Risk Homeostasis and Risk Compensation http://riskhomeostasis.org/.
  
  Behavioural economics which, as far as I can see, is nothing much more than cognitive psychology with a few bells and whistles. Daniel Kahneman is a cognitive psychologist not an economist as is Dan Ariely.
  
  And on a more personal level, a friend from grad school with a PH.D in perception and cognition (psych sub-field) went directly to NASA so he must have been doing something right.
  
  I am sure Richard Feynman was a fine physicist but I question his knowledge of a) all (or much) psychology and b) his understanding of the politics and budget constraints in education and criminal rehabilitation which have much more influence on teaching or criminal rehabilitation than psychological practice.
  
  I’m trained as a psychologist so I can comment knowledgeably on the futility of String Theory.
  
  Reply ↓
- Andrew on June 29, 2017 9:25 PM at 9:25 pm said:
  
  Imaging:
  
  You ask, “Is there anything useful or practical ever come out of social science or psychological research?” I think it’s fair to say that nothing in social science has been as useful or practical as the steam engine, or the transistor, or all sorts of other innovations that have come from the physical sciences. Social science is what it is: it’s our best attempt to understand the social world. One way to see the importance of social science is that people are doing amateur social science all the time. We try our best, that’s all we can do. I say this as someone who’s been publishing papers in social science for nearly thirty years.
  
  Reply ↓
  - Thomas on June 30, 2017 1:56 AM at 1:56 am said:
    
    “Social science is what it is: it’s our best attempt to understand the social world.” Isn’t this exactly what Feynman and Imaging are questioning?
    
    It’s obviously true of physical science. It is our best attempt to understand the physical world. People of course do amateur physical science all the time too. Every time they build or repair something, they are engaging with the physical world and trying to understand it better. The important thing, however, is that physics subsumes all of these everyday experiments. Moreover, physics is often right in a “counter-intuitive” way, i.e., it trumps common sense.
    
    Feynman is here pointing out something important: “…ordinary people with commonsense ideas are intimidated by this pseudoscience. A teacher who has some good idea of how to teach her children to read is forced by the school system to do it some other way–or is even fooled by the school system into thinking that her method is not necessarily a good one.” (This, I can tell you, is also very true of writing instruction.) But while commonsense physics is usually wrong when it disagrees with physical science, it’s not clear that commonsense sociology is usually wrong when it disagrees with social science.
    
    Is social science really “our best attempt at understanding the social world”. Is it better than literature, for example? Are novels and poems “worse” attempts? Is it even demonstrably better than common sense? I think that’s the thrust of Imaging’s question. And I think it’s a good one.
    
    Reply ↓
    - Glen M. Sizemore on June 30, 2017 8:13 AM at 8:13 am said:
      
      AG: “Social science is what it is: it’s our best attempt to understand the social world.” Isn’t this exactly what Feynman and Imaging are questioning?
      
      Thomas: It’s obviously true of physical science. It is our best attempt to understand the physical world. People of course do amateur physical science all the time too.
      
      GS: What are you referring to here? Are you just referring to the fact that people respond WRT aspects of the physical world (e.g., catching a ball implies some sort of “folk knowledge” of physics)? Or are you referring to something else?
      
      Thomas: Every time they build or repair something, they are engaging with the physical world and trying to understand it better.
      
      GS: Is fixing an old TV with a loose wire by smacking it “amateur physics”? Catching a baseball? Hitting a moving person with a thrown football? This answer to this question becomes important when
      
      Thomas: The important thing, however, is that physics subsumes all of these everyday experiments.
      
      GS: Nothing you have said so far clarifies the answer to my question.
      
      Thomas: Moreover, physics is often right in a “counter-intuitive” way, i.e., it trumps common sense.
      
      GS: I’ll say. Is “behaving WRT the physical world” what you are calling “common sense” and “amateur physics”?
      
      Thomas: Feynman is here pointing out something important: “…ordinary people with commonsense ideas are intimidated by this pseudoscience. A teacher who has some good idea of how to teach her children to read is forced by the school system to do it some other way–or is even fooled by the school system into thinking that her method is not necessarily a good one.” (This, I can tell you, is also very true of writing instruction.) But while commonsense physics is usually wrong when it disagrees with physical science, it’s not clear that commonsense sociology is usually wrong when it disagrees with social science.
      
      GS: See, this is a really complex statement. Or, rather, rich with implication and innuendo. I think you are implying that “commonsense psychology” is “better” than academic psychology (though you do not specifically mention psychology). And I guess you are probably saying that the fact that *we,* as part of the lay community, who “behave with respect to others” and intentionally or unintentionally manipulate the behavior of others, are doing things that are “better” than what science can do.
      
      Thomas: Is social science really “our best attempt at understanding the social world”.
      
      GS: What – exactly – is the definition of “social science”? That isn’t a rhetorical question.
      
      Thomas: Is it better than literature, for example? Are novels and poems “worse” attempts? Is it even demonstrably better than common sense? I think that’s the thrust of Imaging’s question. And I think it’s a good one.
      
      GS: For one thing, an important question is “Is [whatever is called] ‘social science’ really ‘science’ at all?” I guess that is what you are asking.
    - Thomas on June 30, 2017 10:48 AM at 10:48 am said:
      
      Glen: I think your last question does capture the general thrust of my comment. But I was being a bit more specific: even if psychology, say, didn’t call itself “science”, we could ask whether it “better” understands the human psyche than poetry.
      
      It’s possible to define “science of X” as our “our best attempt to understand X”. That seems to be the definition that underlies Andrew’s “social science is what it is: our best attempt…” “Our,” I presume, simply means “our culture’s”. My point was that in whatever sense we would straightforwardly grant that “physical science is what it is: our best attempt…” I’m not sure that we’d grant the same of social “science”. Minimally, “science” has to be a “better attempt” than common sense. But also a better attempt than anything else, like literature.
    - Daniel Lakeland on June 30, 2017 12:15 PM at 12:15 pm said:
      
      I think the shenanigans we go through to quantify GDP, or the total population, or median income in a region, or inflation or the like are science. Sure, they’re “just” measurement, but then so is a milligram digital balance for measuring out chemicals.
      
      Perhaps one question is, does social science *theory* add anything on top of measurement, and/or to what extent, and in what fields.
    - Thomas on June 30, 2017 1:28 PM at 1:28 pm said:
      
      I admit I think of economics and demography as physical sciences. They are more like epidemiology than psychology. But I appreciate the prod back to reality. I think the social sciences must be approached individually, not as a whole. Indeed, as we’re constantly reminded here, it’s good look at one study at a time.
    - Glen M. Sizemore on June 30, 2017 1:22 PM at 1:22 pm said:
      
      Thomas: Glen: I think your last question does capture the general thrust of my comment. But I was being a bit more specific: even if psychology, say, didn’t call itself “science”, we could ask whether it “better” understands the human psyche than poetry.
      
      GS: Just for the record, I do call one kind of psychology “science,” but it is a small, small part of psychology. Perhaps some of what is now considered “hard psychology” (i.e., beginning with “post behavioristic” memory research and the “birth” of cognitive psychology) could be considered science, but I guess I would say, ultimately bad science. And, of course, I’m not big on “the psyche.”
      
      Thomas: It’s possible to define “science of X” as our “our best attempt to understand X”. That seems to be the definition that underlies Andrew’s “social science is what it is: our best attempt…” “Our,” I presume, simply means “our culture’s”. My point was that in whatever sense we would straightforwardly grant that “physical science is what it is: our best attempt…” I’m not sure that we’d grant the same of social “science”. Minimally, “science” has to be a “better attempt” than common sense. But also a better attempt than anything else, like literature.
      
      GS: Just so it’s clear, I think a lot of psychology (at least) to be not science at all and I’m guessing I would feel the same about other areas that might actually call themselves “social scientists.” I think the late, great Marvin Harris would have considered himself to be doing social science, for example, but I think that cultural materialism is a natural science that studies what is commonly called “cultural anthropology.” Other approaches to this field I would probably not consider to be science at all or, at best, lame science. But I think that there is no question that a science of behavior is far, far superior to folk methods or the arts, though there is no question that some people become skilled at controlling or interpreting behavior in a direct, contingency-shaped sense.
    - Martha (Smith) on June 30, 2017 10:58 PM at 10:58 pm said:
      
      I”d say, “Social science *should be* our best attempt to understand the social world.” I don’t think that social scientists’ current attempts to understand the social world are (on average; some exceptions, of course) as good as they could be if social scientists were more, well, scientific (as opposed to “that’s the way we do it in our field) in their thinking.
    - Glen M. Sizemore on July 1, 2017 6:47 AM at 6:47 am said:
      
      “…if social scientists were more, well, scientific…”
      
      GS: But the questions are: “Why is social ‘science’ not scientific?” and, relatedly, “What does ‘it’ need to do in order to be scientific?”
- Cliff AB on June 29, 2017 9:36 PM at 9:36 pm said:
  
  A few years ago, Facebook did a study showing that manipulating the Facebook feed could alter people’s mood.
  
  Fast forward a few years, and now Facebook makes everyone so angry I had to delete my account, and the American presidency was basically won by people getting fired up over tweets, many of which may have been from bots. So that’s something.
  
  Reply ↓
  - Ben Curtis on June 30, 2017 5:45 PM at 5:45 pm said:
    
    +1
    
    Reply ↓
- Martha (Smith) on June 29, 2017 10:48 PM at 10:48 pm said:
  
  Commenting specifically on the difficulties of improving teaching methods: I spent a fair amount of my time as a mathematician thinking about the topic of mathematics teaching. It is a multi-problematic question. Even if someone comes up with “methods” that look good on paper, different people interpret them differently (e.g., in mathematics, I discovered that “conceptual understanding” can mean quite different things to an elementary or high school teacher or math education person that it means to me.). One might think that “teaching by setting an example” might help here — but some of the students may think they are “doing as their teacher did” when they are just imitating superficial aspects of their teacher’s behavior but missing the most important aspects of it. (Still, I think teaching teaching by example is on average better than teaching teaching by writing or talking about it.)
  
  Reply ↓
  - jrkrideau on June 30, 2017 1:13 AM at 1:13 am said:
    
    OT but have you seen anything about JUMP Mathematics, https://www.jumpmath.org/ ? Elementary school level so far as as I know but quite intriguing.
    
    From a learning theory approach I find it very interesting (and logical) and the preliminary results look good.
    
    Reply ↓
    - Martha (Smith) on July 1, 2017 4:27 PM at 4:27 pm said:
      
      I’ve stopped looking at math ed stuff — just don’t have enough time for everything.
- Glen M. Sizemore on June 30, 2017 5:19 AM at 5:19 am said:
  
  “Is there anything useful or practical ever come out of social science or psychological research? I am not trolling. I just want to know. Below is what Richard Feynman said…”
  
  A question and a couple of points – first the question: What is “social science”? Is it *just* the subject matter that distinguishes it or is it something deeper having to do with methods etc.? But…on to the “couple of points”:
  
  1.) Whatever “social science” is, there *is* a science of behavior, and that science is NOT “social science” if “social science” is something other than natural science. The science of behavior is a natural science and it is now called “behavior analysis.” It used to be called “the experimental analysis of behavior,” but that has somewhat gone out of fashion. Anyway, that brings me to my second point:
  
  2.) The natural science of behavior has, indeed, spawned a technology (which implies “useful” and “practical”). That field is usually called “applied behavior analysis.” An overlooked aspect, too, is how behavior analysis changed how much research using non-humans is done, even stuff that isn’t called “behavior analysis.” Anyway, if you are interested in what behavior analysis has done over the years you can consult The Journal of the Experimental Analysis of Behavior and The Journal of Applied Behavior Analysis.
  
  3.) It is worth noting that behavior analysis is conceptually and methodologically distinct from “mainstream psychology” which is now called “cognitive psychology.” Behavior analysis has never employed NHST and, in fact, virtually never uses between-group experimental designs. Further, behavior analysis recognizes the indispensability of *sophisticated* conceptual/philosophical analysis and does not, ahem, reify natural-language concepts as in saying, for example, that “beliefs [etc. etc.] cause behavior.” Nor does it rely extensively on metaphor as in the computer metaphor and, specifically, the nonsensical notions of memory storage and retrieval.
  
  4.) As to Feynman, I consider it a tragedy that no one ever explained to him what the natural science of behavior was.
  
  Reply ↓
  - Alex Gamma on July 2, 2017 9:42 PM at 9:42 pm said:
    
    Glen,
    
    so, to turn the question around: Is there anything useful or practical ever come out of behavior analysis?
    
    Also, since beliefs don’t exist, can you give us an idea of what kind of explanation will not only replace but prove to be superior to “Andrew took his umbrella when he went out in the morning because he believed it would rain later”?
    
    Reply ↓
    - HmmGlen M. Sizemore on July 2, 2017 11:23 PM at 11:23 pm said:
      
      AG: Glen,
      
      so, to turn the question around: Is there anything useful or practical ever come out of behavior analysis?
      
      GS: Well…as I said, “yes,” a technology. A technology of behavior. Some of that would be recounted in The Journal of Applied Behavior Analysis. You may not know about the technology because the basic science and its applied arm have been deliberately misrepresented by mainstream psychology and philosophy beginning with the Chomsky debacle…but the field is well-known enough WRT developmental disabilities and autism spectrum. Educational technologies are also another more-or-less direct result. Needless to say, there are those that would dispute what I say. Ironically, one criticism is frequently that it (behavior analysis) has little to show by way of placebo-controlled, random yada yada yada. It is ironic because in behavior analysis the behavior of individual subjects is directly controlled. In addition, often not stated, is that behavior analysis has provided a framework from within which behavioral experiments with nonhumans (as well as humans) can be conducted, even when the experiments are conceptually at odds with the philosophy of the natural science of behavior, radical behaviorism. Every time you see some “cognitive” or “cognitive neuroscience” experiment where, say, monkeys make some easily-executable, short-duration response within a small enclosure that constitutes the simple (or “simple”) environmental context of the response, think “behavior analysis. The fact that the animal’s behavior is controlled by its consequences is, in such experiments, is often hidden behind the throwaway line, “the monkeys received a squirt of juice for X.” These experiments are all offshoots of the the ol’ “rat presses a lever for food” type experiment. Relatedly, behavioral pharmacology would not be what it is if pharmacological questions had not been investigated using nonhumans in prototypical behavior-analytic experiments. Hope that helps…
      
      AG: Also, since beliefs don’t exist, can you give us an idea of what kind of explanation will not only replace but prove to be superior to “Andrew took his umbrella when he went out in the morning because he believed it would rain later”?
      
      GS: Does the alleged “belief” allegedly come from somewhere? Maybe, I don’t know, some past event or, in this case, a long series of events relevant to the behavioral history of person? If Andrew had heard on the TV that the chance of rain was very high later that day, how is it that those “words” (it’s just a bunch of sound, after all) exerted control over his behavior? But…just the fact that “exerted control of his behavior” is a reasonable locution gives you the general form of the answer to your question. Of paramount importance to a natural science of behavior is the relation between current behavior and past events (or “relations between events” would be better). As used by “folk-psychology” and by the related “technical” cognitive psychology (or, at least, one faction of it – the one that takes ordinary-language “mental” terms as the basis of a technical vocabulary), terms like “belief,” “knowledge” etc. are invented “things” that bridge the temporal gap between an animal’s personal history and its behavior. I could be more long-winded, but suffice it to close by saying that terms like “belief” and “knowledge” are, essentially, names for behavior and its context (as in, “Man, that Bobby Jones really knows how to hit the low inside fastball!” when what occasions this comment is simply that Jones frequently hits low inside fastballs).
      
      Behavior is an event that occurs, so to speak, in the intersection of three histories, that relevant to the species, that relevant to the culture, and personal history. Behavior is literally a function of variables that lie in these histories. This will remain true even if the inevitable temporal gaps in causation are “closed.” That, for example, the mechanism of “trait transmission” is known does not diminish the truth of an historical account in terms of selection contingencies (i.e., “natural selection”). Not gonna proof this…late…sorry about any typos…
Marcus on June 30, 2017 12:44 AM at 12:44 am said:

Published research findings can certainly be false inasmuch as they can be fabricated. The claims made on the basis of the falsified results may still be true to some degree but the claim that “this data was collected and analyzed in this way and this is what the analysis tells us” is demonstrably false in a disturbing proportion of empirical articles in psychology.

Reply ↓
Glen M. Sizemore on June 30, 2017 2:19 PM at 2:19 pm said:

“Let’s stop talking about published research findings being true or false…”

GS: Right. Let’s talk about data as possessing reliability and generality.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Let’s stop talking about published research findings being true or false

119 thoughts on “Let’s stop talking about published research findings being true or false”

Leave a Reply Cancel reply