Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

Anova is great—if you interpret it as a way of structuring a model, not if you focus on F tests

Shravan Vasishth writes: I saw on your blog post that you listed aggregation as one of the desirable things to do. Do you agree with the following argument? I want to point out a problem with repeated measures ANOVA in talk: In a planned experiment, say a 2×2 design, when we do a repeated measures […]

Waic for time series

Helen Steingroever writes: I’m currently working on a model comparison paper using WAIC, and would like to ask you the following question about the WAIC computation: I have data of one participant that consist of 100 sequential choices (you can think of these data as being a time series). I want to compute the WAIC […]

Six quotes from Kaiser Fung

You may think you have all of the data. You don’t. One of the biggest myth of Big Data is that data alone produce complete answers. Their “data” have done no arguing; it is the humans who are making this claim. Before getting into the methodological issues, one needs to ask the most basic question. […]

One-tailed or two-tailed

This image of a two-tailed lizard (from here, I can’t find the name of the person who took the picture) never fails to amuse me. But let us get to the question at hand . . . Richard Rasiej writes: I’m currently teaching a summer session course in Elementary Statistics. The text that I was […]

My talk at the Simons Foundation this Wed 5pm

Anti-Abortion Democrats, Jimmy Carter Republicans, and the Missing Leap Day Babies: Living with Uncertainty but Still Learning To learn about the human world, we should accept uncertainty and embrace variation. We illustrate this concept with various examples from our recent research (the above examples are with Yair Ghitza and Aki Vehtari) and discuss more generally […]

Likelihood from quantiles?

Michael McLaughlin writes: Many observers, esp. engineers, have a tendency to record their observations as {quantile, CDF} pairs, e.g., x CDF(x) 3.2 0.26 4.7 0.39 etc. I suspect that their intent is to do some kind of “least-squares” analysis by computing theoretical CDFs from a model, e.g. Gamma(a, b), then regressing the observed CDFs against […]

How does inference for next year’s data differ from inference for unobserved data from the current year?

Juliet Price writes: I recently came across your blog post from 2009 about how statistical analysis differs when analyzing an entire population rather than a sample. I understand the part about conceptualizing the problem as involving a stochastic data generating process, however, I have a query about the paragraph on ‘making predictions about future cases, […]

Confirmationist and falsificationist paradigms of science

Deborah Mayo and I had a recent blog discussion that I think might be of general interest so I’m reproducing some of it here. The general issue is how we think about research hypotheses and statistical evidence. Following Popper etc., I see two basic paradigms: Confirmationist: You gather data and look for evidence in support […]

I disagree with Alan Turing and Daniel Kahneman regarding the strength of statistical evidence

It’s funny. I’m the statistician, but I’m more skeptical about statistics, compared to these renowned scientists. The quotes Here’s one: “You have no choice but to accept that the major conclusions of these studies are true.” Ahhhh, but we do have a choice! First, the background. We have two quotes from this paper by E. […]

Questions about “Too Good to Be True”

Greg Won writes: I manage a team tasked with, among other things, analyzing data on Air Traffic operations to identify factors that may be associated with elevated risk. I think its fair to characterize our work as “data mining” (e.g., using rule induction, Bayesian, and statistical methods). One of my colleagues sent me a link […]