Skip to content

Introducing shinyStan






As a project for Andrew’s Statistical Communication and Graphics graduate course at Columbia, a few of us (Michael Andreae, Yuanjun Gao, Dongying Song, and I) had the goal of giving RStan’s print and plot functions a makeover. We ended up getting a bit carried away and instead we designed a graphical user interface for interactively exploring virtually any Bayesian model fit using a Markov chain Monte Carlo algorithm.

The result is shinyStan, a package for R and an app powered by Shiny. The full version of shinyStan v1.0.0 can be downloaded as an R package from the Stan Development Team GitHub page here, and we have a demo up online here.  If you’re not an R user, we’re working on a full online version of shinyStan too.



For me, there are two primary motivations behind shinyStan:

1) Interactive visual model exploration
  • Immediate, informative, customizable visual and numerical summaries of model parameters and convergence diagnostics for MCMC simulations.
  • Good defaults with many opportunities for customization.
2) Convenient saving and sharing
  • Store the basic components of an entire project (code, posterior samples, graphs, tables, notes) in a single object.
  • Export graphics into R session as ggplot2 objects for further customization and easy integration in reports or post-processing for publication.

There’s also a third thing that has me excited at the moment. That online demo I mentioned above… well, since you’ll be able to upload your own data soon enough and even add your own plots if we haven’t included something you want, imagine an interactive library of your models hosted online. I’m imagining something like this except, you know, finite, useful, and for statistical models instead of books. (Quite possibly with fewer paradoxes too.) So it won’t be anything like Borges’ library, but I couldn’t resist the chance to give him a shout-out.

Finally, for those of you who haven’t converted to Stan quite yet, shinyStan is agnostic when it comes to inputs, which is to say that you don’t need to use Stan to use shinyStan (though we like it when you do). If you’re a Jags or Bugs user, or if you write your own MCMC algorithms, as long as you have some simulations in an array, matrix, mcmc.list, etc., you can take advantage of shinyStan.

Continue reading ‘Introducing shinyStan’ »

Rembrandt van Rijn (2) vs. Bertrand Russell

For yesterday, the most perceptive comment came from Slugger:

Rabbit Angstrom is a perfect example of the life that the Buddha warns against. He is a creature of animal passions who never gains any enlightenment.

In any case, I think we can all agree that Buddha is a far more interesting person than Updike. But, following the rules of the contest, we’re going with the best comment, which comes from Ethan:

Updike. We could ask him to talk to the title “Stan fans spark Bayes craze.” Buddha might just meditate silently for the whole hour.

Bonus points for bringing in Stan and baseball.


Today, the ultimate Dutch master is up against the ultimate rationalist. Rembrandt will paint the portrait of anyone who doesn’t paint himself.

I gotta say, this is one rough pairing. Who wouldn’t want to see Rembrandt do a quick painting demonstration? But, Russell must have been a great lecturer, witty and deep and he could even do math! I have a feeling that Rembrandt was a nicer guy (it would hard to not be a nicer guy than Bertrand Russell, right?), but I don’t know how relevant that is in choosing a speaker.

P.S. As always, here’s the background, and here are the rules.

What hypothesis testing is all about. (Hint: It’s not what you think.)

Screen Shot 2015-03-01 at 10.17.55 PM

I’ve said it before but it’s worth saying again.

The conventional view:

Hyp testing is all about rejection. The idea is that if you reject the null hyp at the 5% level, you have a win, you have learned that a certain null model is false and science has progressed, either in the glamorous “scientific revolution” sense that you’ve rejected a central pillar of science-as-we-know-it and are forcing a radical re-evaluation of how we think about the world (those are the accomplishments of Kepler, Curie, Einstein, and . . . Daryl Bem), or in the more usual “normal science” sense in which a statistically significant finding is a small brick in the grand cathedral of science (or a stall in the scientific bazaar, whatever, I don’t give a damn what you call it), a three-yards-and-a-cloud-of-dust, all-in-a-day’s-work kind of thing, a “necessary murder” as Auden notoriously put it (and for which was slammed by Orwell, a lesser poet put a greater political scientist), a small bit of solid knowledge in our otherwise uncertain world.

But (to continue the conventional view) often our tests don’t reject. When a test does not reject, don’t count this as “accepting” the null hyp; rather, you just don’t have the power to reject. You need a bigger study, or more precise measurements, or whatever.

My view:

My view is (nearly) the opposite of the conventional view. The conventional view is that you can learn from a rejection but not from a non-rejection. I say the opposite: you can’t learn much from a rejection, but a non-rejection tells you something.

A rejection is, like, ok, fine, maybe you’ve found something, maybe not, maybe you’ll have to join Bem, Kanazawa, and the Psychological Science crew in the “yeah, right” corner—and, if you’re lucky, you’ll understand the “power = .06″ point and not get so excited about the noise you’ve been staring at. Maybe not, maybe you’ve found something real—but, if so, you’re not learning it from the p-value or from the hypothesis tests.

A non-rejection, though: this tells you something. It tells you that your study is noisy, that you don’t have enough information in your study to identify what you care about—even if the study is done perfectly, even if measurements are unbiased and your sample is representative of your population, etc. That can be some useful knowledge, it means you’re off the hook trying to explain some pattern that might just be noise.

It doesn’t mean your theory is wrong—maybe subliminal smiley faces really do “punch a hole in democratic theory” by having a big influence on political attitudes; maybe people really do react different to himmicanes than to hurricanes; maybe people really do prefer the smell of people with similar political ideologies. Indeed, any of these theories could have been true even before the studies were conducted on these topics—and there’s nothing wrong with doing some research to understand a hypothesis better. My point here is that the large standard errors tell us that these theories are not well tested by these studies; the measurements (speaking very generally of an entire study as a measuring instrument) are too crude for their intended purposes. That’s fine, it can motivate future research.

Anyway, my point is that standard errors, statistical significance, confidence intervals, and hypotheses tests are far from useless. In many settings they can give us a clue that our measurements are too noisy to learn much from. That’s a good thing to know. A key part of science is to learn what we don’t know.

Hey, kids: Embrace variation and accept uncertainty.

P.S. I just remembered an example that demonstrates this point, it’s in chapter 2 of ARM and is briefly summarized on page 70 of this paper.

In that example (looking at possible election fraud), a rejection of the null hypothesis would not imply fraud, not at all. But we do learn from the non-rejection of the null hyp; we learn that there’s no evidence for fraud in the particular data pattern being questioned.

On deck this week

Mon: What hypothesis testing is all about. (Hint: It’s not what you think.)

Rembrandt van Rijn (2) vs. Betrand Russell

Tues: One simple trick to make Stan run faster

George Carlin (2) vs. Barbara Kruger

Wed: I actually think this infographic is ok

Bernard-Henry Levy (3) vs. Jacques Derrida

Thurs: Defaults, once set, are hard to change.

Judy Garland (4) vs. Al Sharpton

Fri: “The Saturated Fat Studies: Set Up to Fail”

John Waters (1) vs. Bono

Sat: “With that assurance, a scientist can report his or her work to the public, and the public can trust the work.”

Plato (1) vs. Mark Twain (4)

Sun: Causal Impact from Google

Mary Baker Eddy vs. Mohammad (2)

No “On deck this month” this month because I don’t know what all the seminar-speaker matchups are gonna be. I’ll tell you, though, we have some excellent posts in the regular series. So stay tuned!

Buddha (3) vs. John Updike

Yesterday‘s winner is Friedrich Nietzsche. I don’t really have much to say here: there was lots of enthusiasm about the philosopher and none at all for the cozy comedian. Maybe Jonathan Miller would’ve been a better choice.


Now for today’s battle. Buddha is seeded #3 among founders of religions. Updike is the unseeded author of the classic Rabbit, Run, and dozens of memorable short stories, but is detested by Helen DeWitt and various commenters on this blog.

Who’d be a better speaker? Updike is more of a Harvard guy but I guess he’d give a talk at Columbia if we asked, right?

P.S. As always, here’s the background, and here are the rules.

“Precise Answers to the Wrong Questions”

Our friend K? (not to be confused with X) seeks pre-feedback on this talk:

Can we get a mathematical framework for applying statistics that better facilitates communication with non-statisticians as well as helps statisticians avoid getting “precise answers to the wrong questions*”?

Applying statistics involves communicating with non-statisticians so that we grasp their applied problems and they understand how the methods we propose address our (incomplete) grasp of their problems. Statistical theory on the other hand, involves communicating with oneself and other qualified statisticians about statistical models that embody theoretical abstractions and one would be foolish to limit mathematical approaches in this task. However, as put in Kass, R. (2011), Statistical Inference: The Big Picture – “Statistical procedures are abstractly defined in terms of mathematics but are used, in conjunction with scientific models and methods, to explain observable phenomena. … When we use a statistical model to make a statistical inference [address applied problems] we implicitly assert … the theoretical world corresponds reasonably well to the real world.” Drawing on clever constructions by Francis Galton and insights into science and mathematical reasoning by C.S. Peirce, this talk will discuss an arguably mathematical framework (in the Peirce’s sense of diagrammatic reasoning) that might be better.

*“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.” – John Tukey.

P.S. from Andrew: Here’s my article from 2011, Bayesian Statistical Pragmatism, a discussion of Rob Kass’s article on statistical pragmatism.

Key quote from my article:

In the Neyman–Pearson theory of inference, confidence and statistical significance are two sides of the same coin, with a confidence interval being the set of parameter values not rejected by a significance test. Unfortunately, this approach falls apart (or, at the very least, is extremely difficult) in problems with high-dimensional parameter spaces that are characteristic of my own applied work in social science and environmental health.

In a modern Bayesian approach, confidence intervals and hypothesis testing are both important but are not isomorphic [emphasis added]; they represent two different steps of inference. Confidence statements, or posterior intervals, are summaries of inference about parameters conditional on an assumed model. Hypothesis testing—or, more generally, model checking—is the process of comparing observed data to replications under the model if it were true.

Friedrich Nietzsche (4) vs. Alan Bennett

William Shakespeare had the most support yesterday; for example, from David: “I vote for Shakespeare just to see who actually shows up.” The best argument of the serious variety came from Babar, who wrote, “I would vote for WS. Very little is known about the man. I care very little about Marx’s mannerisms but I’d like to know if WS had modern day actor mannerisms. I’d like to see how he moved – is he a physical actor, or just a writer?” That’s an excellent point. Of all the seminar speaker candidates we’ve considered, Shakespeare’s the only one with a physical dimension in that way. It would be great to see the movements of an old-time actor.

But the funniest argument came from Jonathan:

As near as I can figure, Shakespeare was nothing more than a guy who could string a bunch of famous phrases together and make a play out of them. It’s a talent, to be sure, but a fairly minor one. Plus, if he’s in love with Gwyneth Paltrow, I’m out.

Ouch! Willie got zinged, so Karl’s in.

And, today it’s “God is dead” vs. the ultimate cozy comedian. Amazingly enough, it’s been more than 40 years since the first performance of “Forty Years On.”

P.S. As always, here’s the background, and here are the rules.

Bertrand Russell goes to the IRB

Jonathan Falk points me to this genius idea from Eric Crampton:

Here’s a fun one for those of you still based at a university.

All of you put together a Human Ethics Review proposal for a field experiment on Human Ethics Review proposals.

Here is the proposal within my proposal.

Each of you would propose putting together a panel of researchers at different universities. You would propose that each of your panel members – from diverse fields, seniority levels, ethnicities and such – would submit a proposal to his or her ethics review board or Institutional Review Board for approval, and each of the panellists would track the time it took to get the proposal approved, which legitimate ethical issues were flagged, which red herring issues also held things up, and how long and onerous the whole ordeal was.

Still in your proposal, you would then propose gathering the data from your panellists and drawing some conclusions about what sorts of schools have better or worse processes. Specific hypotheses to be tested would be whether universities with medical schools were worse than others because medical ethicists would be on the panel, and whether universities with faculty-based rather than centralised IRBs would have better approval processes.

You would note that members of your panels could ask their University’s HR advisers to get data on the people who are on the IRBs – race, gender, ethnicity, area of study, rank, age, experience, time on panel, number of children, marital status, and sexual orientation (though not all of those would be in each place’s HR database); you’d propose using these as control variables but also to test whether a panel’s experience made any difference and whether having a panel member from your home Department made any difference. It would also be interesting to note whether the gender, seniority, ethnicity and home department of the submitter made any difference to the application.

End of the proposal-within-the-proposal.

Now for the fun part: each one of you reading this is a potential member of a panel for a study for which nobody has ever sought ethical approval, but which will be self-approving in a particularly distributed fashion: The IRB proposal to be tested is the one I’ve just outlined. Whichever of you first gets ethical approval is the lead author on the paper, is a data point, and already has the necessary ethics approval. Everybody else, successful or not, is a data point.

This is just the greatest. You can only do this sort of study if you have IRB approval, but the only way to get IRB approval is . . . to do the study!

This is related to other paradoxes such as: I can do nice (or mean) things to people and write about what happens, but call it “research” and all of a sudden we’re in big trouble if we don’t get permission. Crampton’s idea is beautiful because it wraps the problem in itself. Russell, Cantor, and Godel would be proud.

William Shakespeare (1) vs. Karl Marx

For yesterday‘s winner, I’ll follow the reasoning of Manuel in comments:

Popper. We would learn more from falsifying the hypothesis that Popper’s talk is boring than what we would learn from falsifying the hypothesis that Richard Pryor’s talk is uninteresting.

And today we have the consensus choice for greatest writer vs. the notorious political philosopher. Marx is unseeded in the Founders of Religions category but he’s had lots of influence on the world. Both these guys are pretty quotable. So who’s it gonna be, the actor or the radical?

P.S. No Groucho jokes, please. And no need for reminders that lots of bad things were done in the name of Marxism. We’re choosing a seminar speaker here, that’s all. We’re not endorsing a philosophy.

P.P.S. As always, here’s the background, and here are the rules.

“The harm done by tests of significance” (article from 1994 in the journal, “Accident Analysis and Prevention”)

Ezra Hauer writes:

In your January 2013 Commentary (Epidemiology) you say that “…misunderstanding persists even in high-stakes settings.” Attached is an older paper illustrating some such.

“It is like trying to sink a battleship by firing lead shot at it for a long time”—well put!