My class this spring on applied Bayesian statistical computing

I had various course titles floating around: my course at Columbia this spring is officially called Applied Statistics, and I had promised people that it would cover Bayesian statistics. At Harvard they asked me to teach Statistical Computing, but I wanted to focus on applied Bayesian methods. So I’m putting it all together in the title given above.

If you’re interested in taking the class, let me know if you have any questions or just show up to the first few lectures; it’s Wed Fri 9:00-10:30 at Columbia (if you’re in New York), or Mon 11:30-2:30 (if you’re in Boston).

Motivation:

Statistical computing is to statistics as statistics is to science: necessary but a distraction from the main event. I hate computing, yet I do it all the time. For those of us in this position, it makes sense to spend a bit more time thinking harder about how to compute efficiently. Learning statistical computation is an investment in becoming a more effective practitioner and researcher.

Overview:

We will cover topics in Bayesian computation, statistical graphics, and software validation, as well as special topics that interest the class.

There will be some homework (writing programs and making graphs in R) and a final project to be done in pairs.

The (tentative) syllabus is below.

Readings:

“BDA”: Gelman, Carlin, Stern, and Rubin (2003), Bayesian Data Analysis, second edition

“ARM”: Gelman and Hill (2007), Data Analysis Using Regression and Multilevel/Hierarchical Models

Various papers that you can download, mostly from my website

Computing:

You should be ready to do some programming in R. Set up your computer with R, Bugs, and RWinEdt, following the instructions at www.stat.columbia.edu/~gelman/bugsR (See Appendix B of ARM for more detail).

If you have time, read through Appendix C of BDA ahead of time and implement all the examples there yourself.

It might help to get a book on R; my current favorite is Fox (2002), An R and S-Plus Companion to Applied Regression. There’s also a lot of material on the R website itself.

Plan:

Week 1: Bayes in 3 hours

A quick overview of Bayesian data analysis, treating it as a generalization of maximum likelihood. Introduction to the Gibbs sampler and Metropolis algorithm.

Readings:
BDA, chapter 1, section 3.7, appendix C
chapter 18 of ARM

In class:
Bioassay example from section 3.7 of BDA
My presentation, “Bayesian data analysis: what it is and what it is not”

Homework:
I’ll give you a simple model and you’ll have to program a Metropolis algorithm to take random draws from the posterior distribution.

Week 2: Simulation of random variables and stochastic processes

Simulation consistency and standard errors. Using simulation to summarize posterior inferences. Programming a simulation.

Readings:
ARM, chapter 7
BDA, chapter 10, sections 11.1-11.5
Some article on simulating from network models

In class:
Simulation example from section 7.2 of ARM:
Simulating a simple probability model:
Some possible applied project topics (also contain methods):
– folates and stomach cancer (varying coefficients)
– age, voting, and political coherence (age-period-cohort)
– religion, occupation, and voting (taxonomies)
– speed dating (individual variation)
– impact factors of journals (messy data)
– coalition formation and dissolution (agent modeling)
– smoking in India (deep nesting)
– arsenic in Bangladesh (nonparametric prediction, modeling of behavior)
– how many x’s do you know (networks)
– prediction markets (messy data)
– should the Democrats move to the left? (modeling, simulation)
– representation and spending in subnational units (modeling, graphics)
Some possible methodological project topics (should also be applied):
– empirical distribution of regression coefficients
– numerical linear algebra
– a wacky pattern from a random number generator
– different varying-intercept, varying-slope models
– multiple imputation
– inconsistent Gibbs
– deep interactions
– highest posterior density intervals
– adaptive Metropolis jumping
– displaying regression output
– umacs
– Hal Daume’s program
– Otter research program
– Dynamic graphics

Homework:
(1) I’ll describe a simple stochastic process (perhaps the “restaurant turnover problem”) and have you simulate it and summarize your findings.
(2) I’ll ask you to fit a simple regression model and make posterior inferences for some nonlinear quantities of interest.
(3) I’ll ask you to do a simulation study to assess the efficiency of predicting winners as compared to estimating votes

Week 3: Graphics

Principles of statistical graphics. Many ways to skin a cat. Mockups and the power and limitations of R. Graphics as exploratory data analysis; exploratory data analysis as Bayesian model checking. Where does principle end and taste begin? Dynamic graphics.

Readings:
ARM, appendix A
Gelman (2003), A Bayesian formulation of exploratory data analysis and goodness-of-fit testing
Gelman (2004), Exploratory data analysis for complex models (with discussion)
Andreas Buja, something on dynamic graphics

In class:
Displaying the “bread and peace” model
Graphics game 1
Using introspection to deduce principles of graphics
Connections between graphics and analytical statistics
Graphics game 2
Discussion of dynamic graphics

Homework:
(1) I’ll give you some data and ask you to plot them in a specified way.
(2) I’ll give you some data and ask them to figure out a good way to plot them.

Week 4: Programming

My own principles and real principles. Use looping and functions, don’t repeat code. Use letters, not numbers. Naming conventions. Once you have functionality, it’s easy to reprogram. How R and Bugs work.

Readings:
ARM, sections 19.1-19.2
Abelson and Sussman, Structure and Interptetation of Computer Programs [find it by googling SICP], chapters ??
Some advice on programming practice: norvig.com/luv-slides.ps

In class:
Cleaning up my code from week 1
Discuss the program that graphs regression output
Discuss principles of structured programming
Programming activity (in pairs)
My presentation, “Toward an environment for Bayesian data analysis in R”
Discuss the project

Homework:
(1) I’ll ask you to write a program to make a fairly complicated graph.
(2) By now you and your partner will have chosen your group project.

Week 5: Validation

The folk theorem of computing and model fit. Fake-data simulation to check software. Posterior predictive model checking. Cross-validation. Simpler models as scaffolding.

Readings:
BDA, chapter 6
ARM, chapters 8, section 15.1
Cook, Gelman, and Rubin (2007), Validation of software for Bayesian models using posterior quantiles
Gelman, Fagan, and Kiss (2007), stop-and-frisk article

In class:
My presentation on model checking
Police stop-and-frisk example
More stuff

Homework:
(1) I’ll give you a model and ask you to write a program to fit it and then use fake-data validation to check that the program works.
(2) I’ll give you some data, ask you to fit the model to it and then use posterior predictive checks to assess fit of model to data.

Week 6: Some computationally intensive applications

Agent models and cellular automata. Search engines. Language processing. Imaging. Spatial statistics. Lots of questions, not so many answers.

Readings:
Gelman (2003), Forming voting blocs and coalitions as a prisoner’s dilemma: a possible theoretical explanation for political instability

In class:
My presentation, “Coalitions, voting power, and political instability”
More stuff

Homework:
You’ll have to give your first report on your group project.

Week 7: Open problems in Bayesian simulation

How Bayesian computation can get hard. Multimodality. Posterior correlation. Spikes and funnels. High dimensionality. Intractable likelihoods. Efficient Gibbs samplers and Metropolis jumping rules. 2.4/sqrt(d). Adaptive algorithms.

Readings:
sections 11.6, 11.8-11.10, appendix C of BDA
Kass, Carlin, Gelman, and Neal (1998), Markov chain Monte Carlo in practice: a roundtable discussion

In class:
Bioassay example from section 3.7 of BDA
My presentation, “Computation for Bayesian data analysis”
More stuff

Homework:
I’ll give you a tricky model and ask you to try some methods to compute it efficiently.

Week 8: Optimization and regularization

Numerical linear algebra. General optimization algorithms. Bayesian inference as regularization. Weakly informative priors. Using a corpus of datasets to get a prior distribution. The meaning of conservatism in statistics.

Readings:
ARM, Sections 13.1-13.4
Gelman (2006), Prior distributions for variance parameters in hierarchical models
Gelman, Jakulin, Pittau, and Su (unpublished), A default prior distribution for logistic and other regression models
Something on optimization

In class:
My presentation, “Weakly informative priors”
More stuff

Homework:
I’ll give you an optimization problem to solve.

Week 9: Computing for multilevel models

Multilevel models. General principle of modularity, instead of the old idea of crafting distributions. Multivariate models. Missing-data imputation. Hierarchical Bayes compiler.

Readings:
ARM, sections 19.3-19.6
Gelman (2004), Parameterization and Bayesian modeling
Daume (2007), Hierarchical Bayes compiler, http://www.cs.utah.edu/~hal/HBC/

In class:
My presentation, “Some thoughts on multiple comparisons”
More stuff

Homework:
You’ll try out Daume’s HBC and use fake-data simulation to validate it for your model.

Week 10: Analysis of huge datasets

Data mining. Particle filtering. Analyzing samples of data.

Readings:
???

In class:
Demonstrate scalability with some simple computations
Example of analysis of samples of data

Homework:
I’ll dump a huge dataset in your lap and ask you to explore it and find something interesting.

Week 11: Deep interactions

Structured hierarchical models. Small-area estimation. Parallel time series. Bayesian additive regression trees.

Readings:
Ed George et al. paper on BART
Something by Radford Neal

In class:
My presentation, “Interactions in multilevel models”
More stuff

Homework:
I’ll ask you to use one of these methods to explore that huge dataset in a different way.

Week 12: Machine learning

This is important but I don’t know much about it. I’ll give an overview and discuss how it relates to other statistical ideas.

Readings:
Aleks will recommend something here

In class:
Example of machine learning and comparison with other methods
Link between machine-learning methods and statistical models
Methods for combining and choosing among machine learning algorithms
Possibilities for improving machine learning algorithms through statistical understanding (and vice-versa)

Homework:
Get prepared for your presentations and final projects!

Week 13: Student presentations

Readings:
Two-page background sheet from each group

In class:
Student presentations and class discussions

Homework:
Your presentation in html or xml format (to be posted on the web)

7 thoughts on “My class this spring on applied Bayesian statistical computing

  1. I hadn't thought of videoing the classes but I could see if Harvard or Columbia does that. It would be cool to reach a larger audience. On the other hand, I suspect that my classes are more compelling in person. If you're not actually there in the room, you might be better off just reading the relevant books and articles.

  2. The engineering school at Columbia does record classes for off-campus students subscribed to CVN (Columbia Video Network). There are also some nifty interfaces to the recorded material developed as part of UI research in the CS labs.

    If you're thinking about getting this class recorded, you'll probably need to move it to a room equipped for recording. And presumably ask the right people at CVN.

  3. Sure, you're more compelling in person — as are 99% of people. That's not really an option, though.

    Recording these is NOT going to make you rich, but it's a nice adjunct to your books (enhancing the Gelman "brand", so to speak); worth doing if it can be done at one place or another without much hassle.

  4. If Columbia does not do videos, you should consider http://videolectures.net This site is a great resource of lectures and many of them have linked scrolling notes. The vast majority are in CS and of those some 50% are in Machine Learning. There are a few statisticians who have presented there (Christian Robert has over 5 hours of lectures for example), but these may also be classified under Machine Learning.

    I believe your content would fit in just fine over there if you choose to distribute to more than registered Columbia students (please!).

Comments are closed.