Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

“Usefully skeptical science journalism”

Dean Eckles writes: I like this Wired piece on the challenges of learning about how technologies are affecting us and children. The journalist introducing a nice analogy (that he had in mind before talking with me — I’m briefly quoted) between the challenges in nutrition (and observational epidemiology more generally) and in studying “addictive” technologies. […]

Response to Rafa: Why I don’t think ROC [receiver operating characteristic] works as a model for science

Someone pointed me to this post from a few years ago where Rafael Irizarry argues that scientific “pessimists” such as myself are, at least in some fields, “missing a critical point: that in practice, there is an inverse relationship between increasing rates of true discoveries and decreasing rates of false discoveries and that true discoveries […]

From no-data to data: The awkward transition

I was going to write a post with the above title, but now I don’t remember what I was going to say!

“The idea of replication is central not just to scientific practice but also to formal statistics . . . Frequentist statistics relies on the reference set of repeated experiments, and Bayesian statistics relies on the prior distribution which represents the population of effects.”

Rolf Zwaan (who we last encountered here in “From zero to Ted talk in 18 simple steps”), Alexander Etz, Richard Lucas, and M. Brent Donnellan wrote an article, “Making replication mainstream,” which begins: Many philosophers of science and methodologists have argued that the ability to repeat studies and obtain similar results is an essential component […]

If you have a measure, it will be gamed (politics edition).

They sometimes call it Campbell’s Law: New York Governor Andrew Cuomo is not exactly known for drumming up grassroots enthusiasm and small donor contributions, so it was quite a surprise on Monday when his reelection campaign reported that more than half of his campaign contributors this year gave $250 or less. But wait—a closer examination […]

The statistical checklist: Could there be a list of guidelines to help analysts do better work?

[image of cat with a checklist] Paul Cuffe writes: Your idea of “researcher degrees of freedom” [actually not my idea; the phrase comes from Simmons, Nelson, and Simonsohn] really resonates with me: I’m continually surprised by how many researchers freestyle their way through a statistical analysis, using whatever tests, and presenting whatever results, strikes their […]

He wants to model a proportion given some predictors that sum to 1

Joël Gombin writes: I’m wondering what your take would be on the following problem. I’d like to model a proportion (e.g., the share of the vote for a given party at some territorial level) in function of some compositional data (e.g., the sociodemographic makeup of the voting population), and this, in a multilevel fashion (allowing […]

Divisibility in statistics: Where is it needed?

The basics of Bayesian inference is p(parameters|data) proportional to p(parameters)*p(data|parameters). And, for predictions, p(predictions|data) = integral_parameters p(predictions|parameters,data)*p(parameters|data). In these expressions (and the corresponding simpler versions for maximum likelihood), “parameters” and “data” are unitary objects. Yes, it can be helpful to think of the parameter objects as being a list or vector of individual parameters; and […]

On this 4th of July, let’s declare independence from “95%”

Plan your experiment, gather your data, do your inference for all effects and interactions of interest. When all is said and done, accept some level of uncertainty in your conclusions: you might not be 97.5% sure that the treatment effect is positive, but that’s fine. For one thing, decisions need to be made. You were […]

Flaws in stupid horrible algorithm revealed because it made numerical predictions

Kaiser Fung points to this news article by David Jackson and Gary Marx: The Illinois Department of Children and Family Services is ending a high-profile program that used computer data mining to identify children at risk for serious injury or death after the agency’s top official called the technology unreliable. . . . Two Florida […]

Problems with surrogate markers

Paul Alper points us to this article in Health News Review—I can’t figure out who wrote it—warning of problems with the use of surrogate outcomes for policy evaluation: “New drug improves bone density by 40%.” At first glance, this sounds like great news. But there’s a problem: We have no idea if this means the […]

In my role as professional singer and ham

Pryor unhooks the deer’s skull from the wall above his still-curled-up companion. Examines it. Not a good specimen –the back half of the lower jaw’s missing, a gap that, with the open cranial cavity, makes room enough for Pryor’s head. He puts it on. – Will Eaves, Murmur So as we roll into the last […]

We’re gonna have a discussion of Deborah Mayo’s new book!

That’s Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars. She’ll send us some pages that we can post here, we’ll get some people to share their thoughts, and there will be lots of opportunity for comments.

What is the role of statistics in a machine-learning world?

I just happened to come across this quote from Dan Simpson: When the signal-to-noise ratio is high, modern machine learning methods trounce classical statistical methods when it comes to prediction. The role of statistics in this case is really to boost the signal-to-noise ratio through the understanding of things like experimental design.

Regression to the mean continues to confuse people and lead to errors in published research

David Allison sends along this paper by Tanya Halliday, Diana Thomas, Cynthia Siu, and himself, “Failing to account for regression to the mean results in unjustified conclusions.” It’s a letter to the editor in the Journal of Women & Aging, responding to the article, “Striving for a healthy weight in an older lesbian population,” by […]

Ways of knowing in computer science and statistics

Brad Groff writes: Thought you might find this post by Ferenc Huszar interesting. Commentary on how we create knowledge in machine learning research and how we resolve benchmark results with (belated) theory. Key passage: You can think of “making a a deep learning method work on a dataset” as a statistical test. I would argue […]

Data science teaching position in London

Seth Flaxman sends this along: The Department of Mathematics at Imperial College London wishes to appoint a Senior Strategic Teaching Fellow in Data Science, to be in post by September 2018 or as soon as possible thereafter. The role will involve developing and delivering a suite of new data science modules, initially for the MSc […]

Opportunity for Comment!

(This is Dan) Last September, Jonah, Aki, Michael, Andrew and I wrote a paper on the role of visualization in the Bayesian workflow.  This paper is going to be published as a discussion paper in the Journal of the Royal Statistical Society Series A and the associated read paper meeting (where we present the paper and […]

What is the role of qualitative methods in addressing issues of replicability, reproducibility, and rigor?

Kara Weisman writes: I’m a PhD student in psychology, and I attended your talk at the Stanford Graduate School of Business earlier this year. I’m writing to ask you about something I remember you discussing at that talk: The possible role of qualitative methods in addressing issues of replicability, reproducibility, and rigor. In particular, I […]

Power analysis and NIH-style statistical practice: What’s the implicit model?

So. Following up on our discussion of “the 80% power lie,” I was thinking about the implicit model underlying NIH’s 80% power rule. Several commenters pointed out that, to have your study design approved by NSF, it’s not required that you demonstrate that you have 80% power for real; what’s needed is to show 80% […]