Aargh! Not good from the point of view of a medical consumer!! Consumers and physicians need better information than a single “yes/no” decision, so we/they can make informed decisions in individual cases.

]]>These helpful ladders that once climbed up _need_ to be kicked aside.

But agree, the downsides may be worth it especially if they are gently warned about them.

]]>At a local university, in the intro stats course given by the statistics dept, the answer key for the final exam gave this as the correct answer “the p-value is the probability of the Null hypothesis being true” as did the textbook for the course though only after stating it less wrongly.

]]>I construed EVERY passage we’ve both quoted to mean, “Just go ask a stats prof if NHST makes sense in psychology. They will say it does not. Statisticians know this. Psychology researchers need to learn it.” Even the passage about “abusive reliance on significance testing,” I thought meant abuse by the psychology field AGAINST the tenets of the statistics field.

Apparently I was wrong, and a lot of stats profs would have said that what the pysch profs were doing was just fine. I did not know that. But I don’t think there’s any great mystery about how I could reach that conclusion, since Meehl continually cited standard stats texts to support his critique. If he was unduly mean to Fisher, I’m sorry, but I didn’t think Fisher was still the sum and substance of mainstream statistics in the 1980s.

]]>“Thesis: Owing to the abusive reliance upon significance testing—rather than point or interval estimation, curve shape, or ordination—in the social sciences, the usual article summarizing the state of the evidence on a theory (such as appears in the Psychological Bulletin) is nearly useless…Colleagues think I exaggerate in putting it this way. That’s because they can’t stand to face the scary implications of the thesis taken literally, which is how I mean it.”

He pretty clearly states he thinks application of conventional statistics (ie NHST) has rendered journal articles in his field nearly useless. I’m not sure how that could be interpreted as “begging psychologists to take conventional statistics seriously”.

I don’t have access to the Hays text but searching for the reference I found Gigerenzer et al[1] seem to say the book mentioned a “null range” in an appendix and amazon reviewers claiming the 5th edition is riddled with typos[2]. I guess it is possible Meehl was being sarcastic but that isn’t really his style… Without the text I don’t know what he was referring to exactly, it is possible there is just little about NHST in that book. Brunswick doesn’t seem to be a statistician[3], but I am also unfamiliar with his work.

[1] http://faculty.washington.edu/gloftus/Downloads/CPChance.pdf

[2] http://www.amazon.com/Statistics-William-Hays/dp/0030744679

[3] https://en.wikipedia.org/wiki/Egon_Brunswik

Again my main point is directed towards questions like Dieter’s which seem to imply that we can get whatever conclusions we want by daring to use priors.

I think that

“if you quantify the strength of the evidence as the ratio of posterior to prior, then this quantity will in some sense and in many cases be less sensitive to “bias” than the posterior will be”

is a useful lesson in this particular regard, regardless of whether it solves the all the remaining problems of inference.

]]>Sure.

Again I’m trying to keep distinct my attempt to clarify the formal question about priors and evidence that I think Dieter was raising – e.g. can we just slap a prior on to get whatever conclusion we want – and broader methodological issues.

Wrt the latter – “in general systematic modelling, data analysis and domain knowledge will likely trump choice of specific tools” as above.

]]>Not to beat the proverbial horse, but I had the impression that the “real stats” profs taught their students about all the limitations of NHST that we never heard about in Stats for Sociologists (which was, in the 1980s, essentially a course in p hacking). Sad to hear that’s not the case. Thanks for interacting.

]]>My colleagues at Berkeley back in the 1990s were definitely mainstream and they bought into all that null hypothesis testing stuff. But forget about them, just consider almost every intro statistics textbook sold today: Null hypothesis significance testing is right there. Today’s textbooks represent yesterday’s consensus.

]]>It certainly requires a prior over Y0 (which is often given by the predictive distribution using priors over H0 and H1, but doesn’t always need to be).

But this will be the same for both H0 and H1 and hence the relative comparison of evidence is just the likelihood ratio. The absolute evidence evaluation simply depends on how likely your observed data is (and of course this may follow from averaging over H0 and H1 models).

I’m sure you could bias things if you really tried. But the point is that it’s not as easy to bias the *evidence* (*change* in belief or relative *change* in belief) for or against H0 and H1 than it is to bias the absolute belief (posteriors) themselves.

See Mike Evans’ book/papers for better discussion than I can provide.

]]>http://www.psych.umn.edu/people/meehlp/WebNEW/PUBLICATIONS/128SocScientistsDontUnderstand.pdf

]]>“… Even though it is stated in all good elementary statistics texts, including the excellent and most widely used one by Hays (1973, 415-

17), it still does not seem generally recognized that the null

hypothesis in the life sciences is almost always false—if taken literally

—in designs that involve any sort of self-selection or correlations found in the organisms as they come, that is, where perfect randomization of

treatments by the experimenter does not exhaust the manipulations. Hence

even “experimental” (rather than statistical or file data) research will

exhibit this …. Consequently, whether or not the null hypothesis is rejected is simply and solely a function of statistical power.

“Now this is a mathematical point; it does not hinge upon your preferences in philosophy of science or your belief in this or that kind of theory

or instrument…. [T]he region of the independent variable hyperspace in which the levels of a factor are chosen is something **Fisher didn’t have to worry much about in agronomy, for obvious reasons**; but most psychologists have not paid enough attention to Brunswik on representative design….”

I thought he was making the Gelmanian point that the social sciences involve irreducible variation and confounding correlations (and questions for which there is no “true” point estimate) that Fisher simply never anticipated in agronomic research.

]]>P(H0|Y0)/P(H0)

= [P(Y0|H0)P(H0)/P(Y0)]/P(H0)

= P(Y0|H0)/P(Y0).

i.e. it is essentially a more Bayesian way of formulating likelihoodism. The key difference is retaining the need for P(Y0) for comparison at H0 rather than using the relative comparison of H0 and H1 in which P(Y0) also cancels and leads to likelihoodism.

]]>Can you provide a quote that gave you this impression? In his writings Meehl equates (somewhat unfairly imo) conventional statistics with “Fisherian” statistics/reasoning. This is just one example of a theme repeated for basically his entire career:

“I suggest to you that Sir Ronald has befuddled us, mesmerized us, and led us down the primrose path. I believe that the almost universal reliance on merely refuting the null hypothesis as the standard method for corroborating substantive theories in the soft areas is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology.”

http://ww3.haverford.edu/psychology/ddavis/psych212h/meehl.1978.html

I agree.

]]>“But that stuff about the elicitation framework seems pretty academic. Are there real studies that use it?

In practice people use priors where there’s no good reason to use this over that. Is there reason to believe results are not sensitive to the different priors from different experts?”

Sure they do elicitation in practice, see:

Turner, R. M., Spiegelhalter, D. J., Smith, G., & Thompson, S. G. (2009). Bias modelling in evidence synthesis. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(1), 21-47. Chicago

Is there reason to believe results are not sensitive to priors? That would depend on how much data you have. When you’re stuck with a problem with little data, experts’ judgements become very important, and in those situations different experts’ priors will probably lead to different results. But that’s just quantifying your uncertainty, through sensitivity analyses, taking into account what you know. That seems a whole lot better than the alternative.

]]>In this case you may have a principal-agent problem, and one improvement could be to connect the people who would pay the costs of a bad decision, to the people whose job is to make these modeling assumptions.

]]>Yup, that’s how people learn. The unrealistic prior is more likely to lead to poor decisions. That’s how I’d like frame it, not in terms of fairness to the product but in terms of the organization wanting to make the best decisions. If you really *do* know that the new product is better than what came before, then, sure, use that information in your decision making. But if that “knowledge” is really a false assumption, you’ll pay—which is what happens in any decision problem.

As I wrote above, I don’t like the false-positive, false-negative framework. In most cases I think it’s meaningless to talk about these hypotheses as true or false. Treatments have effects that very: a new treatment can be better for some people and worse for others. The discrete idea of hypotheses being true or false just leads to confusion, I think.

]]>In practice people use priors where there’s no good reason to use this over that. Is there reason to believe results are not sensitive to the different priors from different experts?

]]>I'm not defending the p-value here, I'm simply pointing out a potential flaw in the article.

]]>Live as the statistical cheese in the Hamburger of marketing folks, scientifically oriented industry researchers and academic research is full of tasty surprises.

]]>And I doubt very much that most experts would immediately try to game the elicitation process to get their result to come out “significant”. I don’t know many (actually, I can’t think of any) people who do science maliciously in this manner; the mistakes they make are out of genuine ignorance. Even people like Amy Cuddy are not actively gaming the system; she just doesn’t know what she’s doing (and who can blame her, given the bizarre education one gets in statistics).

]]>In the end, the product turned up to be well in the range of controls with old style confidence intervals. And I am sure, that the company never would have used the prior-shrunken range when they marginally would have missed the magic 5%.

]]>http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470029994.html

]]>2. I *do* advocate for a wholesale abandonment of NHST! I thought I’d made that clear.

I didn’t realize Meehl was not in step with the mainstream statisticians of his day, that’s very interesting. I thought he was just begging psychologists to take conventional statistics seriously. In hindsight his philosophical articles read like conventional wisdom.

]]>That is a meaningless number because it is in water without telling us the temperature or anything else. What about the range of conditions actually expected in stomach (pH, temperature, salinity, presence of food bolus, etc)? What type of errors may arise during manufacturing that affect this value? What is the effect of storage at various temperature/humidity that may be found in practice? Can you devise any conditions where your pill does not consistently float that much longer? What conditions (either manufacturing, storage, or patient characteristics) may lead to excessive float-time?

]]>Yes, but in the old days the classical statisticians were dominant and they could treat the Meehls of the worlds as eccentric cranks. Now the classical statisticians are on the run and they realize that researchers outside of the Gladwell-Cuddy-Gilbert axis don’t trust p-values anymore.

]]>First, I’d ask the person where is the evidence that he is 90% sure that floating time will be a factor of 1.5 to 2.5 higher. This evidence may come, for example, from a confidence interval from a published study, in which case I’d advise that it’s an overestimate because of the statistical significance filter.

Second, I’d lay out the costs and benefits. What happens if a study is designed and outcome X happens? If the [1.5, 2.5] estimate truly is reasonable, that implies some decision recommendations. It’s not about being fair or unfair to the product, it’s about wanting to use resources efficiently.

]]>