Skip to content
 

My talk Fri 1pm at the University of Chicago

It’s the Data Science and Public Policy colloquium, and they asked me to give my talk, Little Data: How Traditional Statistical Ideas Remain Relevant in a Big-Data World. Here’s the abstract:

“Big Data” is more than a slogan; it is our modern world in which we learn by combining information from diverse sources of varying quality. But traditional statistical questions—how to generalize from sample to population, how to compare groups that differ, and whether a given data pattern can be explained by noise—continue to arise. Often a big-data study will be summarized by a little p-value. Recent developments in psychology and elsewhere make it clear that our usual statistical prescriptions, adapted as they were to a simpler world of agricultural experiments and random-sample surveys, fail badly and repeatedly in the modern world in which millions of research papers are published each year. Can Bayesian inference help us out of this mess? Maybe, but much research will be needed to get to that point.

It’s for the Data Science for Social Good program, so I suppose I’ll alter my talk a bit to discuss how data science can be used for social bad. The talk should be fun, but I do want to touch on some open research questions. Remember, theoretical statistics is the theory of applied statistics, and we have a lot of applied statistics to do, so we have a lot of theoretical statistics to do too.

11 Comments

  1. Anoneuoid says:

    “Little Data: How Traditional Statistical Ideas Remain Relevant in a Big-Data World.
    […]
    Recent developments in psychology and elsewhere make it clear that our usual statistical prescriptions, adapted as they were to a simpler world of agricultural experiments and random-sample surveys, fail badly and repeatedly in the modern world in which millions of research papers are published each year.”

    Doesn’t the description contradict the title? I would have guessed that “Traditional Statistical Ideas” == “usual statistical prescriptions”, but “Remain Relevant” is the opposite of “fail badly and repeatedly in the modern world”.

  2. Paul says:

    Is your talk open to the public?

  3. Keith O'Rourke says:

    > adapted as they were to a simpler world of agricultural experiments

    Actually, Cochran started to disagree in 1937 and proposed a Normal-Normal (not yet Bayesian) hierarchical model which in a 1979 unpublished manuscript – “Estimators for the one-way random effects model with unequal error variance” published posthumously in 1981 he sill was unable to implement in general (likelihood convergence problems.

    From the not so widely available paper “O’ROURKE K. Meta-analytical themes in the history of statistics: 1700 to 1938, Pakistan Journal of Statistics. S. Ejaz Ahmed Special Issue 2002”:
    “It is apropos that the last words in this note on the meta-analytical themes in the history of statistics be the last sentence from Cochran’s last paper. “It is well to adopt something of the attitude in exploratory analysis and be on the look out for anything unexpected, since the nature of the tp [time place] interaction [unexplained heterogeneity [or varying treatment effects]] is often a hard thing to puzzle out.”

    So maybe not a failure to recognize the inappropriateness in the _simpler_ world of agricultural experiments but rather unresolved technical challenges to implementing more appropriate methods.

    • Martha says:

      And while we’re comparing and contrasting with agricultural experiments, and talking about exploratory analysis, here’s something I came across yesterday:

      “The process of designing an industrial experiment is not, however, just a matter of taking a standard design from an agricultural handbook and substituting temperature for variety and run for plot. Indeed, differences do exist between agricultural and industrial experimentation … One is that the agronomist usually has to sow his experimental plots in the spring and harvest them in the fall. Plots and plants are relatively cheap, and the emphasis has to be on designing relatively complete experiments; if anything is omitted it will have to wait until next year. On the other hand, much industrial experimentation can be carried out relatively quickly, in a matter of days rather than months, but experimental runs are often very expensive. The emphasis should, therefore, be on small experiments carried out in sequence, with continual feed back of information, so that in designing each stage the experimenter can make use of the results of the previous experiments.

      The agronomist is more interested in the tests of significance in the analysis of variance than is the industrial experimenter. One of the reasons is that the agronomist is more often concerned with uniformity trials. … He wants to be able to accept the null hypothesis of uniformity. The position of the industrial experimenter often differs from that of the agronomist in two ways: he frequently knows before starting the experiment that his treatments are not all the same and is interested in finding out which ones differ and by how much. His emphasis will be on estimation rather than hypothesis testing. He will sometimes argue that failure to reject he null hypothesis is merely the result of taking too small an experiment. Expecting to reject the null hypothesis, he is more interested in confidence intervals.”
      (p. 3, Peter W. M. John, Statistical Design and Analysis of Experiments, 1998 SIAM edition of 1971 original)

  4. Brad Stiritz says:

    Hi Andrew,

    I’m a long-time Chicagoan, but now spend the winters in OC. I enjoyed the replay of your talk last winter at the Simons Center. Do you know if someone will be shooting & posting video of this upcoming talk? If so, would you please post a link afterwards?

  5. cugrad says:

    I was talking to some grad students the other day, the topic was on Bayesian statistics. He tells me that he believes you are the biggest guy in Bayesian inference right now. Just thought you’d know.

Leave a Reply