Skip to content

How do companies use Bayesian methods?

Jason May writes:

I’m in Northwestern’s Predictive Analytics grad program. I’m working on a project providing Case Studies of how companies use certain analytic processes and want to use Bayesian Analysis as my focus.

The problem: I can find tons of work on how one might apply Bayesian Statistics to different industries but very little on how companies actually do so except as blurbs in larger pieces.

I was wondering if you might have ideas of where to look for cases of real life companies using Bayesian principles as an overall strategy.

Some examples that come to mind are pharmaceutical companies that use hierarchical pharmacokinetic/pharmacodynamic modeling, as well as people on the Stan users list who are using Bayes in various business settings. And I know that some companies do formal decision analysis which I think is typically done in a Bayesian framework. And I’ve given some short courses at companies, which implies that they’re interested in Bayesian methods, though I don’t really know if they ended up following my particular recommendations.

Perhaps readers can to supply other examples?

Prediction Market Project for the Reproducibility of Psychological Science

Anna Dreber Almenberg writes:

The second prediction market project for the reproducibility project will soon be up and running – please participate!

There will be around 25 prediction markets, each representing a particular study that is currently being replicated. Each study (and thus market) can be summarized by a key hypothesis that is being tested, which you will get to bet on.

In each market that you participate, you will bet on a binary outcome: whether the effect in the replication study is in the same direction as the original study, and is statistically significant with a p-value smaller than 0.05.

Everybody is eligible to participate in the prediction markets: it is open to all members of the Open Science Collaboration discussion group – you do not need to be part of a replication for the Reproducibility Project. However, you cannot bet on your own replications.

Each study/market will have a prospectus with all available information so that you can make informed decisions.

The prediction markets are subsidized. All participants will get about $50 on their prediction account to trade with. How much money you make depends on how you bet on different hypotheses (on average participants will earn about $50 on a Mastercard (or the equivalent) gift card that can be used anywhere Mastercard is used).

The prediction markets will open on October 21, 2014 and close on November 4.

If you are willing to participate in the prediction markets, please send an email to Siri Isaksson by October 19 and we will set up an account for you. Before we open up the prediction markets, we will send you a short survey.

The prediction markets are run in collaboration with Consensus Point.

If you have any questions, please do not hesitate to email Siri Isaksson.

Statistical Communication and Graphics Manifesto

Statistical communication includes graphing data and fitted models, programming, writing for specialized and general audiences, lecturing, working with students, and combining words and pictures in different ways.

The common theme of all these interactions is that we need to consider our statistical tools in the context of our goals.

Communication is not just about conveying prepared ideas to others: often our most important audience is ourselves, and the same principles that suggest good ways of communication with others also apply to the methods we use to learn from data.

See also the description of my statistical communication and graphics course, where we try to implement the above principles.

[I'll be regularly updating this post, which I sketched out (with the help of the students in my statistical communication and graphics course this semester) and put here so we can link to it from the official course description.]

My course on Statistical Communication and Graphics

Screen Shot 2013-08-03 at 4.23.29 PM

We will study and practice many different aspects of statistical communication, including graphing data and fitted models, programming in Rrrrrrrr, writing for specialized and general audiences, lecturing, working with students and colleagues, and combining words and pictures in different ways.

You learn by doing: each week we have two classes that are full of student participation, and before each class you have a pile of readings, a homework assignment, and jitts.

You learn by teaching: you spend a lot of time in class explaining things to your neighbor.

You learn by collaborating: you’ll do a team project which you’ll present at the end of the semester.

The course will take a lot of effort on your part, effort which should be aligned with your own research and professional goals. And you will get the opportunity to ask questions of guest stars who will illustrate diverse perspectives in statistical communication and graphics.

See also the statistical communication and graphics manifesto.

[I'll be regularly updating this post, which I sketched out (with the help of the students in my statistical communication and graphics course this semester) and put here so we can link to it from the official course description.]

The Fault in Our Stars: It’s even worse than they say

etoiles

In our recent discussion of publication bias, a commenter link to a recent paper, “Star Wars: The Empirics Strike Back,” by Abel Brodeur, Mathias Le, Marc Sangnier, Yanos Zylberberg, who point to the notorious overrepresentation in scientific publications of p-values that are just below 0.05 (that is, just barely statistically significant at the conventional level) and the corresponding underrepresentation of p-values that are just above the 0.05 cutoff.

Brodeur et al. correctly (in my view) attribute this pattern not just to selection (the much-talked-about “file drawer”) but also to data-contingent analyses (what Simmons, Nelson, and Simonsohn call “p-hacking” and what Loken and I call “the garden of forking paths”). They write:

We have identified a misallocation in the distribution of the test statistics in some of the most respected academic journals in economics. Our analysis suggests that the pattern of this misallocation is consistent with what we dubbed an inflation bias: researchers might be tempted to inflate the value of those almost-rejected tests by choosing a “significant” specification. We have also quantified this inflation bias: among the tests that are marginally significant, 10% to 20% are misreported.

They continue with “These figures are likely to be lower bounds of the true misallocation as we use very conservative collecting and estimating processes”—but I would go much further. One way to put it is that there are (at least) three selection processes going on here:

1. (“the file drawer”) Significant results (traditionally presented in a table with asterisks or “stars,” hence the photo above) more less likely to get published.

2. (“inflation”) Near-significant results get jiggled a bit until they fall into the box

3. (“the garden of forking paths”) The direction of an analysis is continually adjusted in light of the data.

Brodeur et al. point out that item 1 doesn’t tell the whole story, and they come up with an analysis (featuring a “lemma” and a “corollary”!) explaining things based on item 2. But I think item 3 is important too.

The point is that the analysis is a moving target. Or, to put it another way, there’s a one-to-many mapping from scientific theories to statistical analyses.

So I’m wary of any general model explaining scientific publication based on a fixed set of findings that are then selected or altered. In many research projects, there is either no baseline analysis or else the final analysis is so far away from the starting point that the concept of a baseline is not so relevant.

Although maybe things are different in certain branches of economics, in that people are arguing over an agreed-upon set of research questions.

P.S. I only wish I’d known about these people when I was still in Paris; we could’ve met and talked.

I didn’t say that! Part 2

Uh oh, this is getting kinda embarrassing.

The Garden of Forking Paths paper, by Eric Loken and myself, just appeared in American Scientist. Here’s our manuscript version (“The garden of forking paths: Why multiple comparisons can be a problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the research hypothesis was posited ahead of time”), and here’s the final, trimmed and edited version (“The Statistical Crisis in Science”) that came out in the magazine.

Russ Lyons read the published version and noticed the following sentence, actually the second sentence of the article:

Researchers typically express the confidence in their data in terms of p-value: the probability that a perceived result is actually the result of random variation.

How horrible! Russ correctly noted that the above statement is completely wrong, on two counts:

1. To the extent the p-value measures “confidence” at all, it would be confidence in the null hypothesis, not confidence in the data.

2. In any case, the p-value is not not not not not “the probability that a perceived result is actually the result of random variation.” The p-value is the probability of seeing something at least as extreme as the data, if the model (in statistics jargon, the “null hypothesis”) were true.

How did this happen?

The editors at American Scientist liked our manuscript but it was too long, also parts of it needed explaining for a nontechnical audience. So they cleaned up our article and added bits here and there. This is standard practice at magazines. It’s not just Raymond Carver and Gordon Lish.

Then they sent us the revised version and asked us to take a look. They didn’t give us much time. That too is standard with magazines. They have production schedules.

We went through the revised manuscript but not carefully enough. Really not carefully enough, given that we missed a glaring mistake—two glaring mistakes—in the very first paragraph of the article.

This is ultimately not the fault of the editors. The paper is our responsibility and it’s our fault for not checking the paper line by line. If it was worth writing and worth publishing, it was worth checking.

P.S. Russ also points out that the examples in our paper all are pretty silly and not of great practical importance, and he wouldn’t want readers of our article to get the impression that “the garden of forking paths” is only an issue in silly studies.

That’s a good point. The problems of nonreplication etc affect all sorts of science involving human variation. For example there is a lot of controversy about something called “stereotype threat,” a phenomenon that is important if real. For another example, these problems have arisen in studies of early childhood intervention and the effects of air pollution. I’ve mentioned all these examples in talks I’ve given on this general subject, they just didn’t happen to make it into this particular paper. I agree that our paper would’ve been stronger had we mentioned some of these unquestionably important examples.

In one of life’s horrible ironies, I wrote a paper “Why we (usually) don’t have to worry about multiple comparisons” but now I spend lots of time worrying about multiple comparisons

On deck this week

Tues: In one of life’s horrible ironies, I wrote a paper “Why we (usually) don’t have to worry about multiple comparisons” but now I spend lots of time worrying about multiple comparisons

Wed: The Fault in Our Stars: It’s even worse than they say

Thurs: Buggy-whip update

Fri: The inclination to deny all variation

Sat: Hoe noem je?

Sun: “Your Paper Makes SSRN Top Ten List”

10th anniversary of “Statistical Modeling, Causal Inference, and Social Science”

Richard Morey pointed out the other day that this blog is 10 years old!

During this time, we’ve had 5688 posts, 48799 comments, and who knows how many readers.

On this tenth anniversary, I’d like to thank my collaborators on all the work I’ve blogged, my co-bloggers (“This post is by Phil”), our commenters, Alex Tabarrok for linking to us way back when, and also the many many people who’ve pointed us to interesting research, interesting graphs, bad research, bad graphs, and links to the latest stylings of David Brooks and Satoshi Kanazawa.

It’s been fun, and I think this blog has been (and I hope will remain) an excellent communication channel on all sorts of topics, statistical and otherwise. Through the blog I’ve met friends, colleagues, and collaborators—including some such as Basbøll and Palko whom I’ve still not yet met!—; I’ve been motivated to think hard about ideas that I otherwise would’ve encountered; and I’m pretty sure I’ve motivated many people to examine ideas that they otherwise would not have thought seriously about.

The blog has been enlivened with a large and continuing cast of characters, including lots of “bad guys” such as . . . well, no need to list these people here. It’s enough to say they’ve provided us with plenty of entertainment and food for thought.

We’ve had some epic comment threads and enough repeating topics that we had to introduce the Zombies category. We’ve had comments or reactions from culture heroes including Gerd Gigerenzer, Judea Pearl, Helen DeWitt, and maybe even Scott Adams (but we can’t be sure about that last one). We’ve had fruitful exchanges with other researchers such as Christian Robert, Deborah Mayo, and Dan Kahan who have blogs of their own, and, several years back, we launched the internet career of the late Seth Roberts.

Here are the titles of the first five posts from our blog (in order):

A weblog for research in statistical modeling and applications, especially in social sciences

The Electoral College favors voters in small states

Why it’s rational to vote

Bayes and Popper

Overrepresentation of small states/provinces, and the USA Today effect

As you can see, some of our recurrent themes showed up early on.

Here are the next five:

Sensitivity Analysis of Joanna Shepherd’s DP paper

Unequal representation: comments from David Samuels

Problems with Heterogeneous Choice Models

Morris Fiorina on C-SPAN

A fun demo for statistics class

And the ten after that:

Red State/Blue State Paradox

Statistical issues in modeling social space

2 Stage Least Squares Regression for Death Penalty Analysis

Partial pooling of interactions

Bayesian Methods for Variable Selection

Reference for variable selection

The blessing of dimensionality

Why poll numbers keep hopping around by Philip Meyer

Matching, regression, interactions, and robustness

Homer Simpson and mixture models

(Not all these posts are by me.)

See you again in 2024!

“Illinois chancellor who fired Salaita accused of serial self-plagiarism.”

I came across a couple of stories today that made me wonder how much we can learn from a scholar’s professional misconduct.

The first was a review by Kimberle Crenshaw of a book by Joan Biskupic about Supreme Court judge Sonia Sotomayor. Crenshaw makes the interesting point that Sotomayor, like many political appointees of the past, was chosen in part because of her ethnic background, but that unlike various other past choices (for example, Antonin Scalia, the first Italian American on the court), “Sotomayor’s ethnicity is still viewed [by many] with skepticism.”

I was reminded of Laurence “ten-strike” Tribe’s statement that Sotomayor is “not nearly as smart as she seems to think she is,” a delightfully paradoxical sentence that one could imagine being said by Humpty Dumpty or some other Lewis Carroll character. More to the point, Tribe got caught plagiarizing a few years ago.

So here’s the question. Based on the letter where the above quote appears, Tribe seems to consider himself to be pretty smart (smarter than Sotomayor, that’s for sure). But, from my perspective, what kind of smart person plagiarizes? Not a very smart person, right?

But maybe I’m completely missing the point. If some of the world’s best athletes are doping, maybe some of the world’s best scholars are plagiarizing? It’s hard for me to wrap my head around this one. Also, in fairness to Tribe, he’s over 70 years old. Maybe he used to be smart when he was younger.

The second story came to me via an email from John Transue who pointed me to a post by Ali Abunimah, “Illinois chancellor who fired Salaita accused of serial self-plagiarism.” I had to follow some links to see what was going on here: apparently there was a professor who got fired after pressure on the university from a donor.

I hadn’t heard of Stephen Salaita (the prof who got fired) or Phyllis Wise (the University of Illinois administrator who apparently was in charge of the process), but apparently there’s some controversy about her publication record from her earlier career as a medical researcher.

It looks like a simple case of Arrow’s theorem, that any result can only be published at most five times. Wise seemed to have published the particular controversial paper only three different times, so she has two freebies to go.

As I discussed a couple years ago (click here and scroll down to “It’s 1995″), in some places Arrow’s theorem is such a strong expectation that you’re penalized if you don’t publish several versions of the same paper.

But, to get back to the main thread here: to what extent does Wise’s unscholarly behavior—and it is definitely unscholarly and uncool to copy your old papers without making clear the source, even if it’s not as bad as many other academic violations, it’s something you shouldn’t do, and it demonstrates an ethical lapse or a level of sloppiness so extreme as to cast questions on one’s scholarship—to what extend should this lead us to mistrust her other decisions, in this case in the role of university administrator?

In some sense this doesn’t matter at all: Wise could’ve been the most upstanding, rule-following scientist of all time and the supporters of Salaita would still be strongly disagreeing with her decision and the process used to make it (just as we can all give a hearty laugh at Laurence Tribe’s obnoxiousness, even if he’d never in his life put his name on someone else’s writing).

Or maybe it is relevant, in that Wise’s disregard for the rules in science might be matched by her disregard for the rules in administration. And Tribe’s diminished capacities as a scholar, as revealed by his plagiarism, might lead one to doubt his judgment of the intellectual capacities of his colleagues.

P.S. A vocal segment of our readership gets annoyed when I write about plagiarism. I continue to insist that my thoughts in this area have scholarly value (see here and here, for example, and that latter article even appeared in a peer-reviewed journal!), but I am influenced by the judgments of others, and so I do feel a little bad about these posts, so I’ve done youall a favor by posting this one late at night on a weekend when nobody will be reading. So there’s that.