Someone asked me about the distinction between bias and noise and I sent him some links. Then I thought this might interest some of you too, so here it is: Here’s a recent paper on election polling where we try to be explicit about what is bias and what is variance: And here are some […]

**Multilevel Modeling**category.

## “A blog post that can help an industry”

Tim Bock writes: I understood how to address weights in statistical tests by reading Lu and Gelman (2003). Thanks. You may be disappointed to know that this knowledge allowed me to write software, which has been used to compute many billions of p-values. When I read your posts and papers on forking paths, I always […]

## Cage match: Null-hypothesis-significance-testing meets incrementalism. Nobody comes out alive.

[cat picture] It goes like this. Null-hypothesis-significance-testing (NHST) only works when you have enough accuracy that you can confidently reject the null hypothesis. You get this accuracy from a large sample of measurements with low bias and low variance. But you also need a large effect size. Or, at least, a large effect size, compared […]

## Facebook’s Prophet uses Stan

Sean Taylor, a research scientist at Facebook and Stan user, writes: I wanted to tell you about an open source forecasting package we just released called Prophet: I thought the readers of your blog might be interested in both the package and the fact that we built it on top of Stan. Under the hood, […]

## Thanks for attending StanCon 2017!

Thank you all for coming and making the first Stan Conference a success! The organizers were blown away by how many people came to the first conference. We had over 150 registrants this year! StanCon 2017 Video The organizers managed to get a video stream on YouTube: https://youtu.be/DJ0c7Bm5Djk. We have over 1900 views since StanCon! (We lost […]

## Looking for rigor in all the wrong places

My talk in the upcoming conference on Inference from Non Probability Samples, 16-17 Mar in Paris: Looking for rigor in all the wrong places What do the following ideas and practices have in common: unbiased estimation, statistical significance, insistence on random sampling, and avoidance of prior information? All have been embraced as ways of enforcing […]

## Two unrelated topics in one post: (1) Teaching useful algebra classes, and (2) doing more careful psychological measurements

Kevin Lewis and Paul Alper send me so much material, I think they need their own blogs. In the meantime, I keep posting the stuff they send me, as part of my desperate effort to empty my inbox. 1. From Lewis: “Should Students Assessed as Needing Remedial Mathematics Take College-Level Quantitative Courses Instead? A Randomized […]

## Avoiding selection bias by analyzing all possible forking paths

Ivan Zupic points me to this online discussion of the article, Dwork et al. 2015, The reusable holdout: Preserving validity in adaptive data analysis. The discussants are all talking about the connection between adaptive data analysis and the garden of forking paths; for example, this from one commenter: The idea of adaptive data analysis is […]

## fMRI clusterf******

Several people pointed me to this paper by Anders Eklund, Thomas Nichols, and Hans Knutsson, which begins: Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data. Here, we used resting-state fMRI data from 499 healthy controls to conduct 3 million task group analyses. […]

## What is the chance that your vote will decide the election? Ask Stan!

I was impressed by Pierre-Antoine Kremp’s open-source poll aggregator and election forecaster (all in R and Stan with an automatic data feed!) so I wrote to Kremp: I was thinking it could be fun to compute probability of decisive vote by state, as in this paper. This can be done with some not difficult but […]

## Some modeling and computational ideas to look into

Can we implement these in Stan? Marginally specified priors for non-parametric Bayesian estimation (by David Kessler, Peter Hoff, and David Dunson): Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of […]

## Mister P can solve problems with survey weighting

It’s tough being a blogger who’s expected to respond immediately to topics in his area of expertise. For example, here’s Scott “fraac” Adams posting on 8 Oct 2016, post titled “Why Does This Happen on My Vacation? (The Trump Tapes).” After some careful reflection, Adams wrote, “My prediction of a 98% chance of Trump winning […]

## Trump +1 in Florida; or, a quick comment on that “5 groups analyze the same poll” exercise

Nate Cohn at the New York Times arranged a comparative study on a recent Florida pre-election poll. He sent the raw data to four groups (Charles Franklin; Patrick Ruffini; Margie Omero, Robert Green, Adam Rosenblatt; and Sam Corbett-Davies, David Rothschild, and me) and asked each of us to analyze the data how we’d like to […]

## Q: “Is A 50-State Poll As Good As 50 State Polls?” A: Use Mister P.

Jeff Lax points to this post from Nate Silver and asks for my thoughts. In his post, Nate talks about data quality issues of national and state polls. It’s a good discussion, but the one thing he unfortunately doesn’t talk about is multilevel regression and poststratification (or see here for more). What you want to […]

## Polling in the 21st century: There ain’t no urn

David Rothschild writes: The Washington Post (WaPo) utilized Survey Monkey (SM) to survey 74,886 registered voters in all 50 states on who they would vote for in the upcoming election. I am very excited about the work, because I am a huge proponent of advancing polling methodology, but the methodological explanation and data detail bring […]

## Fast CAR: Two weird tricks for fast conditional autoregressive models in Stan

Max Joseph writes: Conditional autoregressive (CAR) models are popular as prior distributions for spatial random effects with areal spatial data. Historically, MCMC algorithms for CAR models have benefitted from efficient Gibbs sampling via full conditional distributions for the spatial random effects. But, these conditional specifications do not work in Stan, where the joint density needs […]

## Publication bias occurs within as well as between projects

Kent Holsinger points to this post by Kevin Drum entitled, “Publication Bias Is Boring. You Should Care About It Anyway,” and writes: I am an evolutionary biologist, not a psychologist, but this article describes a disturbing Scenario concerning oxytocin research that seems plausible. It is also relevant to the reproducibility/publishing issues you have been discussing […]

## Hey pollsters! Poststratify on party ID, or we’re all gonna have to do it for you.

Alan Abramowitz writes: In five days, Clinton’s lead increased from 5 points to 12 points. And Democratic party ID margin increased from 3 points to 10 points. No, I don’t think millions of voters switched to the Democratic party. I think Democrats are were just more likely to respond in that second poll. And, remember, […]

## His varying slopes don’t seem to follow a normal distribution

Bruce Doré writes: I have a question about multilevel modeling I’m hoping you can help with. What should one do when random effects coefficients are clearly not normally distributed (i.e., coef(lmer(y~x+(x|id))) )? Is this a sign that the model should be changed? Or can you stick with this model and infer that the assumption of […]