“I mean, what exact buttons do I have to hit?”

While looking for something else, I happened to come across this:

Unfortunately there’s the expectation that if you start with a scientific hypothesis and do a randomized experiment, there should be a high probability of learning an enduring truth. And if the subject area is exciting, there should consequently be a high probability of publication in a top journal, along with the career rewards that come with this. I’m not morally outraged by this: it seems fair enough that if you do good work, you get recognition. I certainly don’t complain if, after publishing some influential papers, I get grant funding and a raise in my salary, and so when I say that researchers expect some level of career success and recognition, I don’t mean this in a negative sense at all.

I do think, though, that this attitude is mistaken from a statistical perspective. If you study small effects and use noisy measurements, anything you happen to see is likely to be noise, as is explained in this now-classic article by Katherine Button et al. On statistical grounds, you can, and should, expect lots of strikeouts for every home run—call it the Dave Kingman model of science—or maybe no home runs at all. But the training implies otherwise, and people are just expecting to the success rate you might see if Rod Carew were to get up to bat in your kid’s Little League game.

To put it another way, the answer to the question, “I mean, what exact buttons do I have to hit?” is that there is no such button.

33 thoughts on ““I mean, what exact buttons do I have to hit?”

  1. As an aside, I loved the retro baseball players used as archetypes. Kingman! Carew! I’m looking forward to a Juan Samuel reference at some point.

  2. I think that the articule of Bottons et al. is tricky. Depending on what do you take as the alternative (in their notación, this is determined by the Cohen effect d), a different mesure of power can be computed. They provide a table summarizing the meta analysis, each study seem to be examined with different alternatives, which seem reasonable as different problems may admit stronger deviations from the null than others, but this does not mean that necessarily one has to consider these values, and even minor changes can affect considerably the powers, so this meta analysis of powers is somewhat subjective. If they would have considered slightly different alternatives, powers coul be much higher…or smaller. After all, for a large sample the power function will be a smooth curve moving from a value Alfa around the null to nearly one in other regions of the parameters space. With small samples you could get less power, but it is tricky to measure at a specific point, or even worse, at selected different points for each study. Hopefully they were not picky…but yet it does not seem too objective to me.

    • Jose:

      As discussed in my paper with John Carlin, I don’t think that “power” is the best way to look at these things, but I respect the key idea of the paper of Button et al., which is that, when measurements are noisy, statistical significance doesn’t mean much. It’s the winner’s curse: if you win, you lose.

      • I agree with you, power is just one feature, there are many other considerations, after all testing is just a decision problem, the optimal decision rule changes with the objective function, and one can consider multiple goals, this again depends on the decision maker in a way that can be written with precision, as opposite one could consider a more qualitative point of view on testing where subjectivity is not clearly formalized. This is a bigger problem, I think.

  3. One consequence of this point, that there is no such button, is that there is no substitute for studying the methods we use to do inference. Once someone comes up with a SPSS like software for Bayesian methods, we are going to see the button-pressing happening again, only this time with Bayes.

    A further issue is that the deployment of statistics—and this has nothing to do with whether p-values or Bayesian methods are used—is just considered a tool to assert what you already believe. The evidence comes from people always finding empirical evidence for their own position (are Amy Cuddy or Tracy+Beall ever going to find evidence that shows they were wrong on power posing or the color red/pink/reddish-pinkish hues?). Some people assure me that “this is how science works in practice”. You take a position, and then you are loyal to that position, come what may. It’s like an American courtroom drama, you are either arguing for the prosecution of the defence, it doesn’t really matter what the truth is. You have to stand your ground.

    I don’t see the underlying problem as being the use of Bayesian or frequentist methods.

    Maybe what we need is an Opponent Exchange. You want to study a phenomenon, find an opponent who doesn’t believe a word of your theory, and collaborate with them on the study. Let them in on your game and let them make sure you are not going to fool yourself.

    • Why cannot the reviewers be these persons who do not believe a word of your theory? My impression is that conscientious, sincere, painstakingly careful review would improve the situation a lot.

      e.g. If Andrew says that he might catch the obvious methodological flaws of the Bem or fertility study why didn’t the actual reviewers?

      • There are several reasons why reviewing doesn’t achieve these goals (at least in my experience).

        1. Even the most dedicated reviewer cannot know all the details of the monkey business that led to the paper and conclusions. For example, papers get published in linguistics where a t-value is reported, but if the degrees of freedom had been reported, the alert reviewer would have realized something was wrong. This sort of thing goes undetected. Or the authors will do hundreds of tests and report the one or two that came out significant.

        2. Reviewers have a one-sided power over the authors, and they often use it, esp. when they don’t sign their reviews. They hand down a decision, and the editor says: no go, submit somewhere else. It’s often not possible to have a conversation with a reviewer. Reviewing is often not just a quality-control process, it is also a way to assert one’s authority over a field, e.g., by rejecting results that do not match your beliefs as a reviewer. This discourages dissenting opinions from even being considered.

        3. I have never reviewed a paper (70-80 journal articles so far) that didn’t have methodological flaws, and I have never done a study that didn’t have them too. The patient is already dead by the time the paper gets to the reviewer. The reviewer can only pronounce the patient dead. One could minimize these methodological flaws by working with an opponent from the outset. Just do a joint paper, so both parties get credit for whatever the outcome is.

        • 1. If getting published is your motivation, how big is your incentive to detect monkey business?

          3. The question isn’t about some flaw. But was the flaw big enough that it was fatal? Did it materially impact the conclusions? Would you sign your name on a paper that had that flaw? I’d rather the reviewers didn’t certify a corpse as alive and thriving. No shame in sending a corpse to the mortuary. Would we be worse off had reviewers refused to sign off on that fertility paper?

          You can, of course, collaborate with as many people as you want to reduce methodological flaws. My point is that conscientious reviewing still has a very useful value because the reviewer is a disinterested third party.

        • I do concede that conscientious reviewing is very valuable. I don’t know what percentage of reviewing falls in that category though. I wonder if someone has researched this; it would be very useful to quantify reviewing quality. It would be a tough study to conduct, but without such information it’s hard to argue about it.

          Maybe journals can allow one to rate a reviewer, with the editor providing an independent rating that has twice as much weight. That would help one to see what happens in practice. I currently have only a narrow view from the perspective of one person.

        • Agreed. All good ideas.

          What puzzles me is when I get this reaction: “Oh, peer review by three people can never be perfect. So let’s just get rid of the peer review process.”

          That’s like a car company whose assembly line when plagued by crappy cars decides it is a good idea to fire their entire Quality Control Department & just rely on Blue Book customer ratings of their models.

        • Shravan:

          Sorry, I didn’t mean to say you were suggesting that. But I’ve gotten this reaction from some others. On this blog & elsewhere.

        • I would add to this list that, in clinical medical research at least, there are not enough willing reviewers with an adequate knowledge of statistics to spot and articulate methodological problems for every submission to every journal.

        • Why not try paying reviewers for their time? That might make reviewers more willing?

          How many $$ is the typical clinical research grant?

        • Rahul:

          Yes, I agree, this would help. But you’d have to pay a lot, and once you start paying, perhaps there’s a fear that it would destroy the existing gift economy. Also, as a previous commenter noted, there just aren’t enough statisticians around to review all the millions of papers submitted each year to journals.

          I have a friend who started some online journals about 20 years ago where he had an explicit rule that if you submitted a paper, you were required to contribute three referee reports of your own in his journal system. It sounded like a good idea to me but I think he abandoned the plan, I’m not sure why. Then he later sold the journals, which pretty much destroyed everything, in fact one of these journals started charging for access and the statistical society just flat-out abandoned it and started their own replacement with a nearly equivalent title.

          I guess one way to interpret that story is that there is still money to be made from the traditional publishing model of gouging the libraries while getting authors to write for free, referees to review for free, and editors to edit for free.

        • Andrew:

          How much do you think we’d have to pay for the incentive to work? How much does a typical post doc or Professor make every month?

          Now contrast that with the costs of running a typical clinical trial or research project.

        • Some of the larger Journals, e.g. the Lancet family, do pay statistical reviewers around $150 or so in order to ensure fast and well informed feedback. I believe that some smaller journals do occasionally pay a fee for a statistical review, but it would not be economically viable for most publishers to do this routinely.

          I’m not an expert on research grant funding, but large flagship projects and clinical trials should have adequate funding for statistical work. However, work carried out by clinical research fellows etc may not have any specific attached funding for collaboration with qualified statisticians.

        • Yes Oliver some funds are set out for statistics, but are often spent on covering shortfalls for other activities.

          More than once, on large funded studies, I was told there no resources actually left for statistical analysis so you have to do it for free. Once, I had only been paid to be on the data safety monitoring committee and the investigator tried to make the argument that because of the that I had to do the final analysis of the completed trail. Depending on who the investigators are and the politics they sometimes get away with this. (Best to trap the resources at the start of the trail, but that can’t always be done.)

          Also, I once reviewed the statistical reviews of a statistician paid by a major journal – they were often simply wrong (just because the journal pays some statistician does not fix the problem).

        • Rahul:

          Given a funding agency has decided to fund very expensive research, do you think they have any interest of having a better evaluation of that decision – they made – than the hyped up journalistic claims those they have funded can fan?

          (When I first applied for funding to evaluate the reproducibility of a subset of that funding agency’s recently completed projects, my director said I don’t want to spend any time on reviewing this for you as its not something they will want to fund. He was right.)

        • Rahul:

          I did have some encouraging discussions with a director of health based charity (e.g. Cancer society) where he indicated it would be hard sell to his members (not specifically for our disease) but he did say he would try to get other charities to do something jointly…

          That was around 2005, 10 years later there is some pressure from those who fund the funders, including the US president’s science office – but its not the field I am in anymore. Maybe someone who is can give us an update.

        • Can I add #4 Shravan? I often get a paper, and there is some obvious (to me) piece missing, some aspect of what is done that without it the paper cannot be reviewed. I respond, usually quickly, and say this information is needed before a proper review is undertaken. The paper cannot be reviewed without this information. I get three types of reply. Most common, is that the editor waits until the other two reviews are in and treats my comment like a review. The second is that they say to ignore that bit and review it anyway (which I say no too). The third is where they email the authors, get the information, and I rapidly review. In all of these the other reviewers even if they notice the problem still think the paper is reviewable. The problem is that editors often don’t read the manuscript in more detail than simply to choose reviewers.

    • This reminds me of one of my favorite economics papers, which has its second paragraph: “This paper began as a sharp disagreement between the two authors as to the proper explanation of the puzzle. We investigate three approaches to reconciling theory and experiment.” (Friedman and Ostroy, Competitivity in Auction Markets: A Theoretical and Experimental Investigation, Economic Journal 105).

    • Shravan:

      You make some very good points in this thread but I am biased because I suggested months ago that two Ivy league universities try to set this up together, on some things our faculty will be advocates for – yours critics against both with full access – stuff that gets through this process will be branded as more likely to be true (and because we do this, give us more grant money please!)

      Probably not feasible, random audits of research is more feasible – but now I expect someone to ask me how I would like if we said we don’t trust you and we want to audit you (and Corey will have to come to my defense again)….

    • “Once someone comes up with a SPSS like software for Bayesian methods, we are going to see the button-pressing happening again, only this time with Bayes.”

      From the new Stata v. 14 manual:

      Menu
      Statistics > Bayesian analysis > Estimation

        • From a quick scan of the intro – is not too terrible

          “Building a reliable Bayesian model requires extensive experience from the researchers, which leads
          to the second difficulty in Bayesian analysis—setting up a Bayesian model and performing analysis
          is a demanding and involving task. This is true, however, to an extent for any statistical modeling
          procedure.”

          Given the second sentence has never been heeded, every reason to suspect the first sentence will simply be ignored.

  4. In their book, “Evidence-Based Policy: A Practical Guide to Doing it Better,” Nancy Cartwright and Jeremy Hardie write, “We do not think that it is possible to produce unambiguous rules for predicting the results of social policies. … So in our world, those who make the decisions will have to deliberate using their judgment and discretion. … [Y]ou—or someone on your behalf—will have to think, and think specifically about your situation.” (p. 93)

Leave a Reply

Your email address will not be published. Required fields are marked *