Skip to content
 

Forum in Ecology on p-values and model selection

There’s a special issue of the journal (vol. 95, no. 3) featuring several papers on p-values. There’s also a discussion that I wrote, which does not appear in the journal (for reasons explained below) but which I extract and link to below. First, the papers in the special section:

P values, hypothesis testing, and model selection: it’s déjà vu all over again
Aaron M. Ellison, Nicholas J. Gotelli, Brian D. Inouye, Donald R. Strong

In defense of P values
Paul A. Murtaugh

The common sense of P values
Perry de Valpine

To P or not to P?
Jarrett J. Barber, Kiona Ogle

P values are only an index to evidence: 20th- vs. 21st-century statistical science
K. P. Burnham, D. R. Anderson

Model selection for ecologists: the worldviews of AIC and BIC
Ken Aho, DeWayne Derryberry, Teri Peterson

In defense of P values: comment on the statistical methods actually used by ecologists
John Stanton-Geddes, Cintia Gomes de Freitas, Cristian de Sales Dambros

Comment on Murtaugh
Michael Lavine

Recurring controversies about P values and confidence intervals revisited
Aris Spanos

Rejoinder
Paul A. Murtaugh

Finally there’s my own contribution, The problem with p-values is how they’re used:

I agree with Murtaugh (and also with Greenland and Poole 2013, who make similar points from a Bayesian perspective) that with simple inference for linear models, p-values are mathematically equivalent to confidence intervals and other data reductions, there should be no strong reason to prefer one method to another. In that sense, my problem is not with p-values but in how they are used and interpreted.

Based on my own readings and experiences (not in ecology but in a range of social and environmental sciences), I feel that p-values and hypothesis testing have led to much scientific confusion by researchers treating non-significant results as zero and significant results as real. . . .

I have, on occasion, successfully used p-values and hypothesis testing in my own work, and in other settings I have reported p-values (or, equivalently, confidence intervals) in ways that I believe have done no harm, as a way to convey uncertainty about an estimate (Gelman 2013). In many other cases, however, I believe that null hypothesis testing has led to the publication of serious mistakes . . .

The article under discussion reveals a perspective on statistics which, by focusing on static data, is much different from mine. Murtaugh writes:

Data analysis can be always be redone with different statistical tools. The suitability of the data for answering a particular scientific question, however, cannot be improved upon once a study is completed. In my opinion, it would benefit the science if more time and effort were spent on designing effective studies with adequate replication, and less on advocacy for particular tools to be used in summarizing the data.

I do not completely agree with this quotation, nor do I entirely agree with its implications. First, the data in any scientific analysis are typically not set in stone, independent of the statistical tools used in the analysis. Often I have found that the most important benefit derived from a new statistical method is that it allows the inclusion of more data in drawing scientific inferences. . . .

My second point of disagreement with the quotation above is in the implication that too much time is spent on considering how to perform statistical inference. (Murtaugh writes of “advocacy” but this seems to me to be a loaded term.) It is a well-accepted principle of the planning of research that the design of data collection is best chosen with reference to the analysis that will later be performed. We cannot always follow this guideline—once data have been collected, they will ideally be made available for any number of analyses by later researchers—but it still suggests that concerns of statistical methods are relevant to design.

In conclusion, I share the long-term concern (see Krantz 1999, for a review) that the use of p-values encourages and facilitates a sort of binary thinking in which effects and comparisons are either treated as zero or are treated as real, and also an old-fashioned statistical perspective under which it is difficult to combine information from different sources. The article under discussion makes a useful contribution by emphasizing that problems in research behavior will not automatically be changed by changes in data reductions. The mistakes that people make with p-values, could also be made using confidence intervals and AIC comparisons, and I think it would be good for statistical practice to move forward from the paradigm of yes/no decisions drawn from stand-alone experiments.

Hypothesis testing and p-values are so compelling in that they fit in so well with the Popperian model in which science advances via refutation of hypotheses. . . . But a necessary part of falsificationism is that the models being rejected are worthy of consideration. . . . In common practice, however, the “null hypothesis” is a straw man that exists only to be rejected. In this case, I am typically much more interested in the size of the effect, its persistence, and how it varies across different situations. I would like to reserve hypothesis testing for the exploration of serious hypotheses and not as in indirect form of statistical inference that typically has the effect of reducing scientific explorations to yes/no conclusions.

The journal editors sent me Murtaugh’s paper and invited me to write a short comment, which I did, and it was all set to be published when I found out that there was a $300 publication fee. I couldn’t bring myself to pay money to have the journal publish something that I wrote for them for free! I explained this to the editors who graciously let me withdraw the paper. So instead I’m posting it here, for the marginal cost of approximately $0.

53 Comments

  1. Alexander says:

    Doesn’t Columbia have a fund to support this kind of thing? Quickly googling finds me this: http://scholcomm.columbia.edu/services/coap-fund/

  2. I can’t believe they find anyone to pay $300 to publish a comment!

    As to Columbia paying, I always wondered where our overhead dollars went.

    Paying these costs to journals strikes me as silly. Every field should just launch cheaper, quicker-to-review, quicker-to-publish, open-access journals. But then I doubt Columbia would give me $3000/year for that.

  3. Dan says:

    As an ecologist that places a high value on Andrew’s opinion, this is disappointing. The editors sent the invitation to comment for a good reason, which should have been enough to warrant waiving the fee. I’m surprised that any of these invited comments would have required payments by the authors.

  4. Jay says:

    Andrew, you’re too nice. I’d have billed them for writing the article and, if they protested, graciously offer to credit their publication fee.

  5. My view is that for a scientific publication, it is the author (not the journal and not the reader) that is the principal beneficiary. Therefore authors should pay (at least part of the costs). I think the other view (that the journal or the reader or the library should pay) is one of the ideas that has led to all the closed publishing, papers behind paywalls, crazy IP and copyright policies, and inaccessibility of the literature in the developing world.

    • Andrew says:

      David:

      I don’t consider myself the beneficiary in this case. I was writing the article as a public service, which I think describes a lot of what I do professionally!

      • Shravan Vasishth says:

        Andrew, elsewhere on this thread you wrote:

        “In statistics and political science, it is my impression that putting reviewing on your CV gives you approximately zero credit for hiring and promotion.”

        What about being an editor of a journal? It has a lot of similarities to reviewing: you do it for free, out a sense of obligation, etc (but it’s a lot more work). Does it give academic credit? It seems a lot of statisticians put their editorial work on their cvs, so I assume yes.

    • Rahul says:

      If so, academic publishing is broken. We are producing a product that we must pay people to consume?

      • Shravan Vasishth says:

        Rahul, is this news to you? See, for example:

        http://thecostofknowledge.com/

        • Rahul says:

          I’m saying I find people having to pay to publish a bit more abhorrent than people having to pay to read.

          I’m on board for protesting against the ridiculous charges by commercial publishers. But the solution is reasonable charges or arxiv or Journals paid via grants or via society dues.

          But forcing authors to pay is a solution worse than the underlying problem.

          • Shravan Vasishth says:

            What about PLoS One, which charges authors (although one can opt out of paying)? Isn’t that effectively “Journals paid via grants”? Is that also a bad thing?

            • Rahul says:

              I’m not familiar with their system: what does it mean that they charge but one can opt out? So the author fee is entirely optional?

              If so, I’m ok with that. If it’s totally optional I treat it like a donation. I’m not against you donating money to a journal. Go right ahead.

              But I don’t think that’s what you meant by “one can opt out”. And then yes, the PLoS model is a bad thing.

              • Shravan Vasishth says:

                It was misleading of me to say that one can opt out. People who can’t afford to pay can get a waiver. Probably useful for researchers who have no funding, in India, for example. I am assuming that if I were broke in terms of research funds, I could get a waiver even though I live in Germany.

                Excerpt from http://www.plosone.org/static/policies#publication

                “PLOS Global Participation Initiative (Low- and Middle-Income Country Assistance)

                Authors’ research which is funded primarily (50% or more of the work contained within the article) by an institution or organization from eligible low- and middle-income countries will receive partial (group 2 countries) or full (group 1 countries) fee funding paid by the PLOS Global Participation Initiative (GPI). Group 2 PLOS GPI country authors who need to request additional support should apply for PLOS Publication Fee Assistance instead of the PLOS GPI.”

                I did understand what you meant by “funded from grants”. But the German NSF (called the DFG) allows us to ask for publication money in advance when we apply for a research grant. It’s not much, but it’s there. And I think in EU research grants they demand that the articles be open access; I think they will cover payment of articles via grants. So that’s another model.

                I’m on the fence with this paid-for-by-author system.

              • Rahul says:

                I think the whole model is pretty bizarre: Just because some bakers overcharge for bread let’s ask the farmer to pay when he “sells” corn?

                There’s no way this would have worked in the absence of the perverse incentives that exist in academia. i.e. (1) Publication has signalling value that gets the author monetary benefits via other routes & (2) The possibility of passing on costs to an external body e.g. NSF, DFG etc. so that it doesn’t pinch the authors pockets.

                Why pay PLoS Biology $2400 to publish? You should post your paper on arxiv & attach comments from three peers to it. Hell, you could even pay each guy the $800 from the $2400 PLoS was going to pocket anyways. Why do we need the middleman?

              • Shravan Vasishth says:

                “Why pay PLoS Biology $2400 to publish? You should post your paper on arxiv & attach comments from three peers to it. Hell, you could even pay each guy the $800 from the $2400 PLoS was going to pocket anyways. Why do we need the middleman?”

                Those three peers wouldn’t do it. Their incentive is to do the review AND put it on their cv that they review for prestigious journal such-and-such. They can’t put on their cv that they did online reviews (well, they can, but they won’t get credit for it). Ultimately it’s the desire for money and prestige (besides the love of pure naked power) that’s driving reviewers.

                But I agree that the whole setup is crazy.

              • Rahul says:

                @Shravan:

                Right. It’d only work with authors & reviewers both drawn from an Andrew-like demographic. One’s not caring about CV’s & careers & citation counts desperately any more. :)

              • Andrew says:

                Shravan:

                You write, “Their incentive is to do the review AND put it on their cv that they review for prestigious journal such-and-such. . . . Ultimately it’s the desire for money and prestige (besides the love of pure naked power) that’s driving reviewers.”

                I disagree. In statistics and political science, it is my impression that putting reviewing on your CV gives you approximately zero credit for hiring and promotion. I think people review because of a feeling of obligation, either to the field in general or toward the editors who ask them to review. Also the naked power thing, sure. But reviewing is one thing that’s not about the CV.

            • Rahul says:

              To clarify: When I said “Journals paid via grants” I meant, say, the $10 Million PLoS got from the Gordon and Betty Moore Foundation.

              If they ran their journal entirely out of such grants I’m ok with that.

              • Shravan Vasishth says:

                ” It’d only work with authors & reviewers both drawn from an Andrew-like demographic. One’s not caring about CV’s & careers & citation counts desperately any more.”

                Even people with tenure are usually pre-programmed to always want more of everything. Maybe if they all had the stature of Tim Gowers (or Andrew) it’d be different (big maybe), but that’s obviously not going to happen.

              • Hi Rahul,

                coming back to the PLoS ONE issue: I recently submitted my second PLoS ONE paper, it was accepted after a two-month review cycle (the review had two expert reviewers).

                By contrast, the average waiting time of my students’ papers in conventional journals is 3 *years*. The more interesting work from them shakes up the common ground, and this upsets reviewers, who try to slow the publication down.

                From a decision theoretic perspective, what leads to a lower expected loss, paying 1350 USD or whatever to PLoS ONE, or waiting three years to publish one’s first paper? I would say 1350 USD is peanuts compared to the loss accruing from waiting, and waiting, and waiting (month 9) revising, revising, and revising (year 1.5) and waiting, waiting, waiting (year 2.2) revising, revising, revising (year 2.6) and waiting, waiting, waiting (year 3)….

          • David Bell says:

            Just to clarify, Ecology is a journal published by the Ecological Society of America, not a commercial publisher. I do not mind reasonable page charges for publishing, but I do have an issue with my professional society (I am an ecologist) inviting Andrew to comment while sticking its hand out for a couple hundred dollars.

  6. Mayo says:

    Andrew: I haven’t read Murtaugh’s article, though I will when I get it. I think your point could well be in sync with what he says about planning:
    “But a necessary part of falsificationism is that the models being rejected are worthy of consideration. . . . In common practice, however, the “null hypothesis” is a straw man that exists only to be rejected. In this case, I am typically much more interested in the size of the effect, its persistence, and how it varies across different situations.”

    Intelligent planning allows formulating a null (however ‘unworthy’ or oversimple it might appear) so that one can learn something of interest when it is rejected or not. E.g., It was the particular information one could glean from how the various null hypotheses in the Higgs boson experiments were rejected that made the data informative. This was an outgrowth of planning. And each time they fail to reject a null that asserts, in effect, “no exotic decay” they rule out one more possibility, constraining the space. [Even though many in HEP are seething (and even expecting) to find an exotic decay or something indicative of beyond standard model (BSM) physics.] This is very rough, but you get the idea. The misuses of tests that serious people have long deplored, I say, are the result of bad methodology/bad science. People should stop blaming the tools and learn how they can work well. We hear some in statistical forensics declare that QRPs (questionable research practices) cannot be helped if one expects to publish in such and such a field. If true, then I say either design more constrained studies or shut down the field.

    • “If true, then I say either design more constrained studies or shut down the field.”

      +1

      The downside for linguists and psycholinguists if we do what Mayo suggests is that much less stuff will get published. This is probably a good thing.

    • Andrew says:

      Mayo:

      The methods are what they are, I agree that you can’t blame them for anything. And, indeed, statistical theory, when looked at carefully, does require some model for the data-collection process (which is often forgotten, for example, by textbooks that present methods for obtaining classical confidence intervals using the likelihood function alone, or which present the data-collection rule as being irrelevant Bayesian inference). But I do blame many of the promoters of various statistical methods, for two reasons. First, it seems common for people to downplay the assumptions associated with their methods and to downplay the value of checking these assumptions. When Bayesians go around telling people that they couldn’t and shouldn’t check their models (based on the bizarre (to me) motivation that if a model is subjective it should be sacrosanct) or when classical statisticians categorize their methods as assumption-free, this bothers me. I’m also bothered by various misleading expressions in the literature such as “unbiasedness” or the so-called Fisher exact test which is exact only under a model that almost never makes sense. I have the feeling that most if not all the writers who express these views, do so out of ignorance and thoughtlessness (recall how lack of interest in philosophy is a point of pride among many working scientists), nonetheless they can still spread confusion in their writing.

      OK, that’s reason #1. Reason #2 is that it seems to me that statistics is often presented as a form of alchemy that converts uncertainty into certainty. Eric Loken and I discuss this in detail in our recent article, The AAA tranche of subprime science. This is related to the thing that’s been bothering me so much recently, the widespread attitude that, once something happens to get by 3 referees and get published in a peer-reviewed journal, it should be considered to be almost above criticism.

      • Michael Lew says:

        Mayo:

        “People should stop blaming the tools and learn how they can work well.” You are so right! I’ll use that idea in a talk I’m giving tomorrow. Thank you.

        Andrew:

        Which statistical textbooks derive classical confidence intervals from likelihood functions? I’ve looked for likelihood in dozens of basic statistics textbooks that are in my Uni’s libraries. “Likelihood” and “likelihood function” are absent from the index of almost all. When likelihood appears it is almost always in the context of the Neyman-Pearson lemma, which is the least interesting thing about likelihood that I can think of. Which textbooks do you have in mind?

        • Shravan Vasishth says:

          Hi,

          There’s an amazing book called “In All Likelihood” by Yudi Pawitan, which takes Fisher’s “third way”. This book derives CIs using likelihood.

        • Andrew says:

          Michael:

          You write, “Which statistical textbooks derive classical confidence intervals from likelihood functions?”

          I don’t have any classical statistics textbooks handy but to respond to your question I just googled *maximum likelihood confidence interval* and found this set of slides which derives classical confidence intervals from likelihood functions. I’m not saying these slides are exemplary—indeed, just glancing at them quickly I noticed several serious errors. My point is just that it is common practice to teach estimation and confidence intervals based on the likelihood function without reference to other aspects of the data-generating process.

          • Shravan Vasishth says:

            That link didn’t work for me, Andrew.

            • Andrew says:

              Link fixed. But please remember that these slides are just something I found in a google. My intent is not to pick on their author, I’m just using them as an example to quickly answer Michael Lew’s question above.

              • Shravan Vasishth says:

                Wow, if even an associate professor of statistics has “several serious errors” in their slides on the topic of likelihood based confidence intervals, that is pretty worrying.

              • Andrew says:

                I think people have a habit of saying things that sound good without thinking clearly about what they’re really saying.

              • K? O'Rourke says:

                Andrew:

                I do agree with Michael’s concern. For instance, of the 20 or so recent Stats Phd’s that attended the SAMSI 2008 summer program, their understanding of likelihood functions was minimal with few aware of the difficulties that arise in other than simple applications (more than one parameter or under Normal assumptions . Many commented to me afterwards, that they had never heard of Neyman-Scott problems before.

                So in many courses and textbooks, coverage of likelihoods functions and what Stephen Stigler has argued their failure to actually provide a general and successful approach in statistics is/was inadequate.

              • Chris G says:

                > I think people have a habit of saying things that sound good without thinking clearly about what they’re really saying.

                +1.

                (BTW, I never do that. Ever.)

          • Shravan Vasishth says:

            Andrew wrote: “My intent is not to pick on their author, “

            Why not? Why are Jessica Tracy and Alec Beall not exempt from being picked on but this guy is? I’m just trying to understand the rationale.

            • Andrew says:

              Shravan:

              If someone puts their work out in a public venue, whether it be the New York Times or Psychological Science (or, for that matter, Slate or this blog), they can expect for their errors to be challenged—that’s fair and appropriate.

              But when someone puts unpublished slides up on a website that happens to be googlable, that to me seems like a different story. I have no idea who made those slides and no particular interest in finding out. I just used them as an example of some mistakes that people make when not thinking too hard about statistics. It would seem a bit harsh to me to single someone out based on errors in unpublished slides.

      • Christian Hennig says:

        “When Bayesians go around telling people that they couldn’t and shouldn’t check their models (based on the bizarre (to me) motivation that if a model is subjective it should be sacrosanct)”
        Probably the wrong place to discuss this here but I think that the rationale is not that subjective models are sacrosanct, but that they are not modelling the data generating process but the subjective prior belief of the individual and can therefore not be checked against data but only against manifestations of the prior belief (to what extent this is done or should be done more is a different story).
        Not that I don’t try to advocate subjective Bayes here but just to say that it is not quite as bizarre as you make it look.

        • K? O'Rourke says:

          Christian:

          Some would argue (e.g. CS Peirce) that mental constructions that can “not be checked against data” [or ever face up to brute force reality or surprise] are bizarre or at least of no value in scientific discourse. Andrew is making it out as bizarre as it should be taken.

          • Christian Hennig says:

            In subjective Bayes, all probability assignments including the sampling model are done a priori (and I call the whole lot “prior model”). There are two different issues:
            a) Check whether the prior model models what it is supposed to model, namely the individual’s prior belief,
            b) check how good/worthwhile for scientific discourse this prior belief is.
            My comment was about issue a, yours I’d direct to issue b. Nobody will stop the subjective Bayesian individual to check their beliefs against data that is already available a priori before the new data comes in, and which therefore can be legitimately used to construct the prior model (which still then models the prior belief that may have changed from checking against such data).
            As a data analyst with frequentist leanings I see why Andrew likes to test what I call prior model even against posterior data, so I’m not against this in principle, but I also see how it obscures the meaning of the prior model and the motivation of Bayesian reasoning (unless it is all interpreted in a frequentist manner and even the prior is meant to model a data generating process) if this can potentially be changed based on posterior data, so I can understand why many Bayesians don’t want that.

            • K? O'Rourke says:

              Thanks for the clarification.

              > even the prior is meant to model a data generating process

              Not sure where it is, but I liked Andrew’s comment somewhere that priors really just represent crappy data analysis of past data [i.e. a data generating process], but understand/agree that many Bayesians might not agree.

              • Christian Hennig says:

                I heard Andrew saying in a presentation that one could think of a “true prior” as the distribution of distributions that occur in studies of a certain type in a certain field, which would point in the direction of a frequentist-type data generating process. Of course “of a certain type in a certain field” leaves much space for interpretation and fine-tuning and Andrew better clarifies this himself if he wants. (Anyway I’m citing from my memory so it’s probably my words and not exactly Andrew’s.)

    • question says:

      “People should stop blaming the tools and learn how they can work well.”

      I want compare the growth rate of two sets of cells (Type A and Type B) that are supposed to be exactly the same except one is missing a single gene. I only have time/money to run the experiment 3 times. What statistics do you suggest?

      • george says:

        question: while acknowledging that it depends a bit on the circumstances, in most cases three data points isn’t enough to tell you anything about the A:B difference, with any degree of certainty – and hence it won’t be enough to convince anyone else of any A:B difference of actual interest. This holds regardless of the form of statistical analysis you use.

        With the same caveat holding, in some cases you could use such data to rule out absolutely huge A:B differences, or to add to the existing knowledge of A:B differences to some extremely modest extent. But again, the way one formulates statistics doesn’t change this – no system of statistics can or should prevent one getting disappointing results from weak data.

        Statisticians – the good ones – know this and are careful to acknowledge that no reasonable answer may be available. So any suggestions made to you are going to come with major health warnings, that someone in your situation should heed.

        • question says:

          George,

          Slight variations on the scenario I described account for > 90% of the molecular biology literature. They do think that three data points is enough.* In the 1980s and 1990s it was even common to consider one data point enough, reading papers from that era gets absurd when a generalized claim is made supported by a picture of a single cell.

          So on the one hand these people are being told they need to use statistics by editors/reviewers, on the other they are being told three data points is weak data and statistics can’t really help them. On the yet other hand they see that everyone else is claiming things based off three data points and this seems to be acceptable.

          *Three data points may be acceptable in cases like those Student was dealing with. The difference is that in testing batches of beer is that the distribution is known from previous tests, while no one really knows what to expect when knocking out a gene or whatever.

          • question says:

            Another thing about molecular biology results. To this day, it is somehow acceptable to not report the exact sample size. I see this type of stuff all the time: “n >= 3 for all groups”; “n = 3-6 for all groups”; “experiments were done at least three times, 80-200 cells were measured per experiment”.

            Also the infamous “experiments were performed as described previously [ref #X]“. Then in this reference we find the same thing, etc until the original report whose method clearly differed from the one used in the current paper. This for a field where in many cases incubating for 30 minutes vs 60 minutes or adding reagent A before reagent B vs reagent B before reagent A can lead to wildly different outcome.

            I have trouble trusting data that is reported like this, but if I try to filter it out there is almost nothing left. I don’t know, maybe somehow they are getting the right answer more often than not. Keep in mind even then it is only about the direction of the effect, and it is not possible to integrate this information with other info into a coherent whole (you need reliable estimates of the magnitude).

            I’m not sure Mayo realizes the wide swath of research she is targeting when saying:

            “We hear some in statistical forensics declare that QRPs (questionable research practices) cannot be helped if one expects to publish in such and such a field. If true, then I say either design more constrained studies or shut down the field.”

            It seems to me a lot of experiments will need to be redone in the future to ensure we have reliable information. The only way to figure out how bad the problem is would be a widespread effort at replicating previous results.

            I can not assume that somehow the system has been filtering right from wrong conclusions when they are based on thousands of n=3 experiments reported by people uninterested in sample size and methods. If people knew this stuff they wouldn’t accept the treatments suggested by MDs beyond “cut out the mole” and “set the broken arm”. Maybe someone can explain to me why I am being too pessimistic, what is the secret sauce I am missing?

            • george says:

              question: I’m not involved in molecular biology so can’t help much, sorry!

              But perhaps the n=3 work is in areas where there really are huge effect sizes; e.g. when a single gene is knocked out every knockout animal died (n=3) while none of the controls did – in which case, given well-performed studies, it doesn’t take much data to establish that a gene is essential to life, even if we don’t know how. Or for a design with no controls; we gave each animal (n=3) the treatment and each grew a third ear – another huge effect, so not much data is needed to say something interesting/useful.

              Or, of course, it could be that n=3 work really is telling us nothing useful (as it would be in, say, much of social science) and as you say people in some fields may only get away with QRPs because no-one in that field has successfully argued that they represent cargo cult science.

              Without background knowledge, distinguishing one from the other is really hard. So better to check carefully before making any accusations – or before being too pessimistic.

              • question says:

                George,

                FYI, some knockouts are embryonic lethal for only some high percentage of animals (eg 90% of them die but others can escape the lethality). So even in that case n=3 may not be enough to properly characterize a knockout.

                You are correct that we should check carefully. All the evidence I have found currently available indicates that this way of doing research does not lead to reliable data. Also, I suspected based on personal experience that many people were being too credulous before discovering that others were reporting such problems.

                This blog post covers it. I’m too busy at the moment to find each of the papers, You may need to search for the actual publications (rather than WSJ and Yahoo news):
                http://blog.scienceexchange.com/2012/04/the-need-for-reproducibility-in-academic-research/

  7. Chris G says:

    > The journal editors sent me Murtaugh’s paper and invited me to write a short comment, which I did, and it was all set to be published when I found out that there was a $300 publication fee. I couldn’t bring myself to pay money to have the journal publish something that I wrote for them for free!

    Obviously not an Elsevier journal – in which case it would’ve been $0 for you to publish and $300 each for us to read it;-)

    • dab says:

      Yeah, ESA is only asking me to pay $20 to read each comment that the authors paid $300 to have published. They seem to have the dubious distinction of out-Elseviering Elsevier.

  8. Jeremy Fox says:

    Re: the Ecological Society of America’s pricing policies: They’re a non-profit scientific society whose financial model is more or less typical of small-to-medium sized non-profit scientific societies. See here for some data and discussion:

    http://dynamicecology.wordpress.com/2013/09/18/follow-the-money-what-really-matters-when-choosing-a-journal/

    I wouldn’t claim (and I’m sure the ESA wouldn’t claim) that their financial model should be beyond discussion or criticism. But with respect, lumping them in with a for-profit publisher like Elsevier doesn’t strike me as particularly accurate or helpful.

  9. […] 7) andrewgelman.com: Forum in Ecology on p-values and model selection […]

  10. […] was invited to contribute to the recent special feature in Ecology on P-values. Here’s his piece, which he withdrew from the special feature for reasons he […]

Leave a Reply