There was an interesting editorial in Sunday’s New York Times about the anxiety produced by terrorism and people’s general inability to deal rationally with said anxiety. All kinds of interesting stuff that I didn’t know or hadn’t thought about. Nassim Nicholas Taleb, a professor at UMass Amherst, writes that risk avoidance is governed mainly by […]

## 3 Books

One of the more memorable questions I was asked when on the job market last year was “If you were stranded on a deserted island with only three statistics books, what would they be?”. (I’m not making this up.) If I were actually in that incredibly unlikely and bizarre situation, the best thing would probably […]

## Pet Peeve

I was reading an article in the newspaper the other day (I think it was about Medicare fraud in New York state, but it doesn’t really matter) that presented some sort of result obtained from a “computer analysis.” A computer analysis? Regression analysis, even statistical or economic analysis, would give at least some vague notion […]

## Columbia Causal Inference Meeting

On June 20, we had a miniconference on causal inference at the Columbia University Statistics Department. The conference consisted of six talks and lots of discussion. One topic of discussion was the use of propensity scores in causal inference, specifically, discarding data based on propensity scores. Discarding data (e.g., discarding all control units whose propensity […]

## Objective and Subjective Bayes

Turns out I’m less of an objective Bayesian than I thought I was. I’m objective, and I’m Bayesian, but not really an Objective Bayesian. Last week I was at the OBayes 5 (O for objective) meeting in Branson, MO. It turns out that most of the Objective Bayes research is much more theoretical than I am. I like working with data, and I just can’t deal with prior distributions that are three pages long, even if they do have certain properties of objectiveness.

## From George Box

I recently read George Box’s paper “Sampling and Bayes’ Inference in Scientific Modelling and Robustness” (JRSSA, 1980). It’s a discussion paper, and I really liked his rejoinder. It starts like this:

“To clear up some misunderstandings and to set my reply in context, let me first make clear what I regard as the proper role of a statistician. This is not as the analyst of a single set of data, nor even as the designer and analyser of a single experiment, but rather as a colleague working with an investigator throughout the whole course of iterative deductive-inductive investigation. As a general rule he should, I think, not settle for less. In some examples the statistician is a member of a research team. In others the statistician and the investigator are the same person but it is still of value to separate his dual functions. Also I have tended to set the scene in the physical sciences where designed experiments are possible. I would however argue that the scientific process is the same for, say, an investigation in economics or sociology where the investigator is led along a pat, unpredictable *a priori*, but leading to (a) the study of a number of different sets of already existing data and/or (b) the devising of appropriate surveys.

## Sabermetricians vs. Gut-metricians

There’s a little debate going on in baseball right now about whether decisions should be made using statistics (a sabermetrician is a person who studies baseball statistics) or instincts. Two books are widely considered illustrative of the two sides of the debate. Moneyball, by Michael Lewis, is about the Oakland A’s and their general manager Billy Beane. Beane, with the second-lowest payroll in baseball in 2002, set out to put together an affordable team of undervalued players, using a lot of scouting and statistics. Three nights in August, by Buzz Bissinger, is about St. Louis Cardinals’ manager Tony La Russa, and is seen by some as a counter to Moneyball, with La Russa relying much more on guts when making decisions.

## Another one from the news

There’s a really interesting article in Slate by Steven D. Levitt and Stephen J. Dubner (the authors of Freakonomics) about female births and heptatitis B. The disproportionate number of male births in some Asian countries has been attributed to causes such as selective abortion and infanticide. But, as explained in the paper “Hepatitis B and the Case of the Missing Women”, by Harvard Economics graduate student Emily Oster, Hepatitis B infection rates actually explain a lot of the discrepancy. Pregnant women who have Hepatitis B are more likely to bear sons than daughters, and Hepatitis B is more common in those parts of the world where the proportion of male births is so high. Pretty cool.

Again, though, the reason I’m writing about the article doesn’t have much to do with its subject matter. What struck me more than anything were the article’s opening sentences:

## What This Woman Wants: Covariate Information

The current most emailed headline on the New York Times website is titled “What Women Want,” by op-ed columnist John Tierney. He’s writing about a working paper, “Do Women Shy Away from Competition?”, by Muriel Niederle and Lise Vesterlund, economists at Stanford University and the University of Pittsburgh, respectively. They conducted an experiment where men and women were first paid to add up numbers in their head, earning fifty cents for each correct answer (referred to as the “piece-rate” task). The participants were eventually offered the choice to compete in a tournament where the person who has the most correct answers after five minutes receives $2 per correct answer and everyone else receives zero compensation. One of the main points of the article was that, even at similar levels of confidence and ability, men were much more likely to enter the tournament than women, i.e., women are less willing than men to enter competition. The results of this study yield another possible theory for why there are so few women in top-paying jobs: Even in a world of equal abilities and no discrimination, family issues, social pressures, etc., women might be less likely to end up as tenured professors or CEOs because the jobs are so inherently competitive.

## Thoughts on Teaching Regression

I recently finished my first semester of teaching. I was a TA in grad school, but this was my first time being “the professor.” I was teaching a regression course, and there are several things I’d like to do differently should I teach the same class again in the future. I just have to figure out how.

## A Very Delayed Lightbulb Over my Head

Daniel Scharfstein (http://commprojects.jhsph.edu/faculty/bio.cfm?F=Daniel&L=Scharfstein) recently gave a very good talk at the Columbia Biostatistics Department. He presented an application of causal inference using principal stratification. The example was similar to something I’ve heard Don Rubin and others speak about before, but I realized I’d been missing something important about this particular example.

## Data-driven Vague Prior Distributions

I’m not one to go around having philosophical arguments about whether the parameters in statistical models are fixed constants or random variables. I tend to do Bayesian rather than frequentist analyses for practical reasons: It’s often much easier to fit complicated models using Bayesian methods than using frequentist methods. This was the case with a model I recently used as part of an analysis for a clinical trial. The details aren’t really important, but basically I was fitting a hierarchcal, nonlinear regression model that would be used to impute missing blood measurements for people who dropped out of the trial. Because the analysis was for an FDA submission, it might have been preferable to do a frequentist analysis; however, this was one of those cases where fitting the model was much easier to do Bayesianly. The compromise was to fit a Bayesian model with a vague prior distribution.

Sounded easy enough, until I noticed that making small changes in the parameters of what I thought (read: hoped) was a vague prior distribution resulted in substantial changes in the resulting posterior distribution. When using proper prior distributions (which there are all kinds of good reasons to do), even if the prior variance is really large there’s a chance that the prior density is decreasing exponentially in a region of high likelihood, resulting in parameter estimates based more on the prior distribution than on the data. Our attempt to fix this potential problem (it’s not necessarily a problem if you really believe your prior distribution, but sometimes you don’t) is to perform preliminary analyses to estimate where the mass of the likelihood is. A vague prior distribution is then one that is centered near the likelihood with much larger spread.

## Blog, take two

I’m sure most of you noticed that our blog disappeared for a while last week. Some f&*^ing kid in Michigan of all places hacked into my stat.columbia.edu account through the Wiki. I think the Wiki security problems are now fixed, and have also learned the hard way not to rely on anyone else to back […]

## Bayesian Software Validation

A wise statistician once told me that to succeed in statistics, one could either be really smart, or be Bayesian. He was joking (I think), but maybe an appropriate correlary to that sentiment is that to succeed in Bayesian statistics, one should either be really smart, or be a good programmer. There’s been an explosion […]