Skip to content

Judy Garland (4) vs. Al Sharpton; Derrida advances

WB calls yesterday‘s contest in the comments:

Among French intellectuals, I’d rather hear from a corpse than an active public figure. My vote goes to Derrida.


And, today, the woman who defined Hollywood stardom, up against a religious leader who dabbles in slander. How fabulous is that??

P.S. As always, here’s the background, and here are the rules.

Defaults, once set, are hard to change.

Farewell then
Rainbow color scheme.

You reigned in Matlab
Far too long.

But now that
You are no longer
The default,

Will we
miss you?

We can only

E. T. Thribb (17 1/2)

Here’s the background.  Brad Stiritz writes:

I know you’re a creator and big proponent of open-source tools. Given your strong interest in statistical visualization, I thought you might still be interested in Matlab’s new default color map, “parula”, which replaces their rainbow-spectrum map called “jet”. This blog post presents a series of quiz-questions (with answers) that you might enjoy & this white paper presents background research justifying the big change.
One reason I find this exciting is that the new color map is based on the Lab color space, which was my favorite for natural-looking photo adjustments back when I was seriously into Photoshop. Lab space is just extremely cool both theoretically & practically.
p.s. regarding open-source vs. close-source environments, the rising popularity of R & Python has forced Mathworks to finally start offering low-cost entry into their world. I wonder though how much this can really bend the adoption curve, especially given that they’re trying not to cannibalize full-price sales too much ..?
My reply:   Yes, rainbow is well known to be problematic.  I’m surprised Matlab stuck with it for so long.  I guess the message is that defaults, once set, are hard to change.  For example, R base graphics has some notorious default problems, such as tick marks that are too big and axis labels that are too far from the axes, to the extent that graphs are full of whitespace (a problem that is exacerbated when making grids of graphs).  Everybody has to know by now how bad these defaults are, but nobody changes them.  So good for Matlab for changing its default colors, even if they made the choice 20 years or so too late!

My talk tomorrow (Thurs) at MIT political science: Recent challenges and developments in Bayesian modeling and computation (from a political and social science perspective)

It’s 1pm in room E53-482.

I’ll talk about the usual stuff (and some of this too, I guess).

Bernard-Henry Levy (3) vs. Jacques Derrida; Carlin advances

There wasn’t much enthusiasm yesterday, but I do have to pick a winner, so I’ll go with Zbicylist’s comment: “Carlin. Are there 7 words you can’t say in a seminar? Let’s find out.”


And today we have two more modern French intellectuals! I don’t have much of anything to say about either of these guys so I’ll pass this one on to all of you.

P.S. As always, here’s the background, and here are the rules.

These are the statistics papers you just have to read

Here. And here.

Just kidding. Here’s the real story. Susanna Makela writes:

A few of us want to start a journal club for the statistics PhD students. The idea is to read important papers that we might not otherwise read, maybe because they’re not directly related to our area of research/we don’t have time/etc.

What would you say are the top ten (or five) statistics papers that you think statistics PhD students should read?

What do all of you think? We’ve listed the most-cited statistics papers here. This list includes some classics but I don’t think they’re so readable.

For my recommendations I’d probably like a few papers that demonstrate good practice. Not necessarily the papers that introduce new methods, but the papers that show the use and development of good ideas.

And then there are the papers that introduce bad methods but are readable and thought-provoking. A lot of Tukey’s papers fit this category.

Anyway, let me think . . . I don’t really know where to start . . . let’s look at the index to my books . . .

Angrist, J. D. (1990). Lifetime earnings and the Vietnam era draft lottery: evidence from Social Security administrative records. American Economic Review 80, 313-336. I’m not saying this paper is perfect—what paper is?—but it’s a good one to read and discuss, treating it as an interesting study to criticize, not something to just be admired.

Rubin, D. B. (1980). Using empirical Bayes techniques in the law school validity studies (with discussion). Journal of the American Statistical Association 75, 801–827. This is good because there’s a lively discussion, also the paper itself is very solid.

Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, ed. S. Brooks, A. Gelman, G. L. Jones, and X. L. Meng, 113–162. New York: Chapman & Hall. This is not an applied paper at all, it’s an expository paper on a computational method. But it’s just full of interesting ideas and excellent throwaway lines.

OK, three recommendations is enough from me.

What recommendations can the rest of you give?

George Carlin (2) vs. Barbara Kruger

To decide yesterday‘s contest, I’ll have to point to Jeremy’s comment:

Rembrandt in a walk:

-He believes that “God is in every leaf on every tree”. Most of his greatest paintings are portraits of himself or regular people (as opposed to portraits of kings or Popes, or mythical battles, or etc.) Same for his etchings.

-He believes in embracing variation. Check out especially his later work, which is famously unpolished and is all the more evocative for it. In contrast, Russell spent his whole career trying, and failing, to impose more precision on the foundations of mathematics and language than is possible.

-As a painter, he knows a thing or two about the importance of one’s “model”.

Normally I’d go for any comment that points to my obsessions, but this comment by Jeremy is so clearly doing so, that I’ll have to disqualify it. Also, he didn’t mention Stan. So I’m calling this one for Updike.


Now for today’s bout. I don’t know enough Carlins for that to be an entire category, but George made it in the Comedians category, and he’s up against conceptual artist Barbara Kruger.

If it was up to my friends from high school, Harvey would go for George, and Kenny would go for Barbara. But it’s up to you. Whaddya think?

My first thought is that Carlin should win easily—but, there’s just one thing. Many years ago when I was sick and home from school, I turned on a daytime TV talk show and, who should I see but George Carlin! He was doing a set that was perfectly adapted to his audience. I don’t remember the details but it was things like: Y’know how, when you’re in the supermarket, the cart just spins and spins around? etc. He was doing bits about shopping and doing the laundry and whatever else he thought would work with that audience. What was weird about it was that it was so clearly non-Carlin material, yet it was given the standard Carlin delivery.

At some level this is admirable professionalism—but it also struck me as a bit creepy, almost as if someone released a video of Newt Gingrich giving a stirring soak-the-rich speech to the American Socialists organization, or, umm, I dunno, seeing Ed Wegman give a lecture on research integrity. Put it this way: After seeing his performance on that talk show, I have no doubt that Carlin could give a set that’s perfectly adapted to the Columbia audience—but would we care?

Say what you want about Barbara Kruger—call her a talentless self-promoter with a one-note shtick, whatever—at least you have to admit she won’t compromise.

P.S. As always, here’s the background, and here are the rules.

One simple trick to make Stan run faster

Did you know that Stan automatically runs in parallel (and caches compiled models) from R if you do this:


It’s from Stan core developer Ben Goodrich.

This simple line of code has changed my life. A factor-of-4 speedup might not sound like much, but, believe me, it is!

Introducing shinyStan


As a project for Andrew’s Statistical Communication and Graphics graduate course at Columbia, a few of us (Michael Andreae, Yuanjun Gao, Dongying Song, and I) had the goal of giving RStan’s print and plot functions a makeover. We ended up getting a bit carried away and instead we designed a graphical user interface for interactively exploring virtually any Bayesian model fit using a Markov chain Monte Carlo algorithm.

The result is shinyStan, a package for R and an app powered by Shiny. The full version of shinyStan v1.0.0 can be downloaded as an R package from the Stan Development Team GitHub page here, and we have a demo up online here.  If you’re not an R user, we’re working on a full online version of shinyStan too.



For me, there are two primary motivations behind shinyStan:

1) Interactive visual model exploration
  • Immediate, informative, customizable visual and numerical summaries of model parameters and convergence diagnostics for MCMC simulations.
  • Good defaults with many opportunities for customization.
2) Convenient saving and sharing
  • Store the basic components of an entire project (code, posterior samples, graphs, tables, notes) in a single object.
  • Export graphics into R session as ggplot2 objects for further customization and easy integration in reports or post-processing for publication.

There’s also a third thing that has me excited at the moment. That online demo I mentioned above… well, since you’ll be able to upload your own data soon enough and even add your own plots if we haven’t included something you want, imagine an interactive library of your models hosted online. I’m imagining something like this except, you know, finite, useful, and for statistical models instead of books. (Quite possibly with fewer paradoxes too.) So it won’t be anything like Borges’ library, but I couldn’t resist the chance to give him a shout-out.

Finally, for those of you who haven’t converted to Stan quite yet, shinyStan is agnostic when it comes to inputs, which is to say that you don’t need to use Stan to use shinyStan (though we like it when you do). If you’re a Jags or Bugs user, or if you write your own MCMC algorithms, as long as you have some simulations in an array, matrix, mcmc.list, etc., you can take advantage of shinyStan.

Continue reading ‘Introducing shinyStan’ »

Rembrandt van Rijn (2) vs. Bertrand Russell

For yesterday, the most perceptive comment came from Slugger:

Rabbit Angstrom is a perfect example of the life that the Buddha warns against. He is a creature of animal passions who never gains any enlightenment.

In any case, I think we can all agree that Buddha is a far more interesting person than Updike. But, following the rules of the contest, we’re going with the best comment, which comes from Ethan:

Updike. We could ask him to talk to the title “Stan fans spark Bayes craze.” Buddha might just meditate silently for the whole hour.

Bonus points for bringing in Stan and baseball.


Today, the ultimate Dutch master is up against the ultimate rationalist. Rembrandt will paint the portrait of anyone who doesn’t paint himself.

I gotta say, this is one rough pairing. Who wouldn’t want to see Rembrandt do a quick painting demonstration? But, Russell must have been a great lecturer, witty and deep and he could even do math! I have a feeling that Rembrandt was a nicer guy (it would hard to not be a nicer guy than Bertrand Russell, right?), but I don’t know how relevant that is in choosing a speaker.

P.S. As always, here’s the background, and here are the rules.

What hypothesis testing is all about. (Hint: It’s not what you think.)

Screen Shot 2015-03-01 at 10.17.55 PM

I’ve said it before but it’s worth saying again.

The conventional view:

Hyp testing is all about rejection. The idea is that if you reject the null hyp at the 5% level, you have a win, you have learned that a certain null model is false and science has progressed, either in the glamorous “scientific revolution” sense that you’ve rejected a central pillar of science-as-we-know-it and are forcing a radical re-evaluation of how we think about the world (those are the accomplishments of Kepler, Curie, Einstein, and . . . Daryl Bem), or in the more usual “normal science” sense in which a statistically significant finding is a small brick in the grand cathedral of science (or a stall in the scientific bazaar, whatever, I don’t give a damn what you call it), a three-yards-and-a-cloud-of-dust, all-in-a-day’s-work kind of thing, a “necessary murder” as Auden notoriously put it (and for which was slammed by Orwell, a lesser poet put a greater political scientist), a small bit of solid knowledge in our otherwise uncertain world.

But (to continue the conventional view) often our tests don’t reject. When a test does not reject, don’t count this as “accepting” the null hyp; rather, you just don’t have the power to reject. You need a bigger study, or more precise measurements, or whatever.

My view:

My view is (nearly) the opposite of the conventional view. The conventional view is that you can learn from a rejection but not from a non-rejection. I say the opposite: you can’t learn much from a rejection, but a non-rejection tells you something.

A rejection is, like, ok, fine, maybe you’ve found something, maybe not, maybe you’ll have to join Bem, Kanazawa, and the Psychological Science crew in the “yeah, right” corner—and, if you’re lucky, you’ll understand the “power = .06″ point and not get so excited about the noise you’ve been staring at. Maybe not, maybe you’ve found something real—but, if so, you’re not learning it from the p-value or from the hypothesis tests.

A non-rejection, though: this tells you something. It tells you that your study is noisy, that you don’t have enough information in your study to identify what you care about—even if the study is done perfectly, even if measurements are unbiased and your sample is representative of your population, etc. That can be some useful knowledge, it means you’re off the hook trying to explain some pattern that might just be noise.

It doesn’t mean your theory is wrong—maybe subliminal smiley faces really do “punch a hole in democratic theory” by having a big influence on political attitudes; maybe people really do react different to himmicanes than to hurricanes; maybe people really do prefer the smell of people with similar political ideologies. Indeed, any of these theories could have been true even before the studies were conducted on these topics—and there’s nothing wrong with doing some research to understand a hypothesis better. My point here is that the large standard errors tell us that these theories are not well tested by these studies; the measurements (speaking very generally of an entire study as a measuring instrument) are too crude for their intended purposes. That’s fine, it can motivate future research.

Anyway, my point is that standard errors, statistical significance, confidence intervals, and hypotheses tests are far from useless. In many settings they can give us a clue that our measurements are too noisy to learn much from. That’s a good thing to know. A key part of science is to learn what we don’t know.

Hey, kids: Embrace variation and accept uncertainty.

P.S. I just remembered an example that demonstrates this point, it’s in chapter 2 of ARM and is briefly summarized on page 70 of this paper.

In that example (looking at possible election fraud), a rejection of the null hypothesis would not imply fraud, not at all. But we do learn from the non-rejection of the null hyp; we learn that there’s no evidence for fraud in the particular data pattern being questioned.