Business statistician Kaiser Fung just came out with another book, this one full of stories about how organizations use data:

1. Why do law school deans send each other junk mail?

2. Can a new statistic make us less fat?

3. How can sellouts ruin a business?

4. Will personalizing deals save Groupon?

5. Why do marketers send you mixed messages?

6. Are they new jobs if no one can apply?

7. How much did you pay for the eggs?

8. Are you a better coach or manager?

Unlike most books of this sort, there’s no hero: These are not stories about a fabulous businessman who made millions of dollars by following his dream and taking the customer seriously, nor are they Gladwellian sagas of brilliant scientists, nor are they auto-Gladwellian tales of the Ariely variety. In some ways, the stories in Fung’s book have the form of opened-up business reporting, in which you get to see the statistical models underlying various assumptions and conclusions. In that sense, this book could be a good way for people to learn some fundamental statistical concepts. The next step will be to fold such ideas into statistics (or “data science”) classes. As it is, our examples seem to oscillate between happytalk success stories (with a pretty regression model or a comfortably statistically-significant conclusion) or stories of selection bias, with no good integration into the how-to-do-it material. I tried to put this all together the last time I taught intro stat, but it didn’t work so well.

I was highly disappointed by the book as there seemed to be no leading theme. It is farther from statistics than Kaiser’s previous book, so I wish you well for the intended linkage with statistics…

I liked the book.

I agree with xi’an that there’s no leading theme (other than “kick the tires” on any statistical argument). But I feel happier with that lack of theme than with, say, the theme of Freakonomics (gosh, Steve Levitt is a genius!) or SuperFreakonomics (gosh, and so are his friends!).

In particular, the chapters debunking of the Groupon business model and putting some scale and context around the “Target predicts pregnant women” rang true (these are the chapters dealing with issues I knew more about).

“Auto-Gladwellian” is a great description!

As to integrating this into teaching, it’s very hard. I haven’t taught a class that used “big data” but I have tried to get students to cope with real data, often by using ones I’ve gotten from research. These are messy and I try to make assignments that involve coping with that mess but in a controlled way. A number of classic datasets have interesting features if you look at them closely. A good example is the anorexia dataset taken from the Handbook of Small Datasets, which has some notable outliers, most likely was subject to listwise deletion due to the rather uneven group Ns when you compare cognitive-behavior therapy and family therapy, and for which the obvious “right” model doesn’t exist.

http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/anorexia.html

My new book is certainly more topical in nature than the first one. The theme is how to interpret statistical (data) analyses. All of the coversation around Big Data so far have been about the supply side; I think the most important consequence will be the need for citizens to be smarter consumers of data analyses.

There are statistical concepts for sure. For example, the marketing chapters discuss the power and limitations of models. The economics chapters concern how and why data need to be adjusted and processed into statistics. The first chapter looks at rankings – the construction of subjective statistics (done a lot more than you want in practice). Fantasy football is used to demonstrate using counterfactual data to understand causal effects. etc.

I feel the pain that Andrew is pointing out. It is very hard to teach “numbersense”. You just have to show and tell.

Gret new (and the pls as well) book(s); superb job Kaiser!

The absence of a supposedly leading theme is blessing.