Statistical Challenges of Survey Sampling and Big Data Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, New York Big Data need Big Model. Big Data are typically convenience samples, not random samples; observational comparisons, not controlled experiments; available data, not measurements designed for a particular study. As a result, it is […]

## PhD student fellowship opportunity! in Belgium! to work with us! on the multiverse and other projects on improving the reproducibility of psychological research!!!

Wolf Vanpaemel and Francis Tuerlinckx write: We at the Quantitative Psychology and Individual Differences, KU Leuven, Belgium are looking for a PhD candidate. The goal of the PhD research is to develop and apply novel methodologies to increase the reproducibility of psychological science. More information can […]

## UK election summary

The Conservative party, led by Theresa May, defeated the Labour party, led by Jeremy Corbyn. The Conservative party got 42% of the vote, Labour got 40% of the vote, and all the other parties received 18% between them. The Conservatives ended up with 51.5% of the two-party vote, just a bit less than Hillary Clinton’s […]

## The Publicity Factory: How even serious research gets exaggerated by the process of scientific publication and media exposure

The starting point is that we’ve seen a lot of talk about frivolous science, headline-bait such as the study that said that married women are more likely to vote for Mitt Romney when ovulating, or the study that said that girl-named hurricanes are more deadly than boy-named hurricanes, and at this point some of these […]

## U.K. news article congratulates YouGov on using modern methods in polling inference

Mike Betancourt pointed me to this news article by Alan Travis that is refreshingly positive regarding the use of sophisticated statistical methods in analyzing opinion polls. Here’s Travis: Leading pollsters have described YouGov’s “shock poll” predicting a hung parliament on 8 June as “brave” and the decision by the Times to splash it on its […]

## Another serious error in my published work!

Uh oh, I’m starting to feel like that pizzagate guy . . . Here’s the background. When I talk about my serious published errors, I talk about my false theorem, I talk about my empirical analysis that was invalidated by miscoded data, I talk my election maps whose flaws were pointed out by an angry […]

## Come to Seattle to work with us on Stan!

Our colleague Jon Wakefield in the Department of Biostatistics at the University of Washington is interested in supervising a 2-year postdoc through this training program. We’re interested in finding someone who would with Jon and another faculty member (who is assigned on the basis of interests) on exciting projects in spatio-temporal modeling and the environmental […]

## Static sensitivity analysis

After this discussion, I pointed Ryan Giordano, Tamara Broderick, and Michael Jordan to Figure 4 of this paper with Bois and Jiang as an example of “static sensitivity analysis.” I’ve never really followed up on this idea but I think it could be useful for many problems. Giordano replied: Here’s a copy of Basu’s robustness […]

## The Other Side of the Night

Don Green points us to this quantitative/qualitative meta-analysis he did with Betsy Levy Paluck and Seth Green. The paper begins: This paper evaluates the state of contact hypothesis research from a policy perspective. Building on Pettigrew and Tropp’s (2006) influential meta-analysis, we assemble all intergroup contact studies that feature random assignment and delayed outcome measures, […]

## Some natural solutions to the p-value communication problem—and why they won’t work.

John Carlin and I write: It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis […]

## A continuous hinge function for statistical modeling

This comes up sometimes in my applied work: I want a continuous “hinge function,” something like the red curve above, connecting two straight lines in a smooth way. Why not include the sharp corner (in this case, the function y=-0.5*x if x<0 or y=0.2*x if x>0)? Two reasons. First, computation: Hamiltonian Monte Carlo can trip […]

## Causal inference using Bayesian additive regression trees: some questions and answers

Rachael Meager writes: We're working on a policy analysis project. Last year we spoke about individual treatment effects, which is the direction we want to go in. At the time you suggested BART [Bayesian additive regression trees; these are not averages of tree models as are usually set up; rather, the key is […]

## Using Stan for week-by-week updating of estimated soccer team abilites

Milad Kharratzadeh shares this analysis of the English Premier League during last year’s famous season. He fit a Bayesian model using Stan, and the R markdown file is here. The analysis has three interesting features: 1. Team ability is allowed to continuously vary throughout the season; thus, once the season is over, you can see […]

## Accounting for variation and uncertainty

Yesterday I gave a list of the questions they're asking me when I speak at the Journal of Accounting Research Conference. All kidding aside, I think that a conference of accountants is the perfect setting for a discussion of of research integrity, as accounting is all about setting up institutions to enable trust. […]

## How to interpret “p = .06” in situations where you really really want the treatment to work?

We’ve spent a lot of time during the past few years discussing the difficulty of interpreting “p less than .05” results from noisy studies. Standard practice is to just take the point estimate and confidence interval, but this is in general wrong in that it overestimates effect size (type M error) and can get the […]

## A completely reasonable-sounding statement with which I strongly disagree

From a couple years ago: In the context of a listserv discussion about replication in psychology experiments, someone wrote: The current best estimate of the effect size is somewhere in between the original study and the replication’s reported value. This conciliatory, split-the-difference statement sounds reasonable, and it might well represent good politics in the context […]

## “This is why FDA doesn’t like Bayes—strong prior and few data points and you can get anything”

In the context of a statistical application, someone wrote: Since data is retrospective I had to use informative prior. The fit of urine improved significantly (very good) without really affecting concentration. This is why FDA doesn't like Bayes—strong prior and few data points and you can get anything. Hopefully in this case I […]

## Prior *information*, not prior *belief*

From a couple years ago: The prior distribution p(theta) in a Bayesian analysis is often presented as a researcher’s beliefs about theta. I prefer to think of p(theta) as an expression of information about theta. Consider this sort of question that a classically-trained statistician asked me the other day: If two Bayesians are given the […]