Skip to content

Constructing an informative prior using meta-analysis

Chris Guure writes:

I am trying to construct an informative prior by synthesizing or collecting some information from literature (meta-analysis) and then to apply that to a real data set (it is longitudinal data) for over 20 years follow-up.

In constructing the prior using the meta-analysis data, the issue of publication bias came up. I have tried looking to see if there is any literature on this but it seems almost all the articles on Bayesian meta-analysis do not actually account for this issue apart from one (Givens, Smith and Tweedie 1997).

My thinking was that I could assume a data augmentation approach by fitting a joint model with the assumption that the observed data are normally distributed and the unobserved studies probably exist but not included in my studies and can be thought of to be missing data (missing not at random or non-ignorable missingness). This way a Bernoulli distribution could be used to account for the missingness.

But according to Lesaffre and Lawson 2012, pp. 196; in hierarchical models, the data augmentation approach enters in a quite natural way via the latent (unobserved) random effects. This statement to me implies that my earlier idea may not be necessary and may even bias the posterior estimates.

My reply: You could certainly do this, build a model in which there are a bunch of latent unreported studies and then go from there. I don’t know how well this would work, though, for two reasons:

1. Estimating what’s missing based on the shape of the distribution—-that’s tough. Inferences will be so sensitive to all sorts of measurement and selection issues, and I’d be skeptical of whatever comes out.

2. You’re trying to adjust for unreported studies in a meta-analysis. But I’d be much more worried about choices in data processing and analysis in each of the studies you have. As I’ve written many times, I think the file-drawer problem is overrated and it’s nothing compared to the garden of forking paths.

Uri Simonsohn warns us not to be falsely reassured


I agree with Uri Simonsohn that you don’t learn much by looking at the distribution of all the p-values that have appeared in some literature. Uri explains:

Most p-values reported in most papers are irrelevant for the strategic behavior of interest.

Covariates, manipulation checks, main effects in studies testing interactions, etc. Including them we underestimate p-hacking and we overestimate the evidential value of data. Analyzing all p-values asks a different question, a less sensible one. Instead of “Do researchers p-hack what they study?” we ask “Do researchers p-hack everything?”

He demonstrates with an example and summarizes:

Looking at all p-values is falsely reassuring.

I agree and will just add two comments:

1. I prefer the phrase “garden of forking paths” because I think the term “p-hacking” suggests intentionality or even cheating. Indeed, in the quoted passage above, Simonsohn refers to “strategic behavior.” I have not doubt that some strategic behavior and even outright cheating goes on, but I like to emphasize that the garden of forking paths can occur even when a researcher does only one analysis of the data at hand and does not directly “fish” for statistical significance.

The idea is that analyses are contingent on data, and researchers can and do make choices in data coding, data exclusion, and data analysis in light of the data they see, setting various degrees of freedom in reasonable-seeming ways that support their model of the world, thus being able to obtain statistical significance at a high rate, merely by capitalizing on chance patterns in data. It’s the forking paths, but it doesn’t feel like “hacking,” not is it necessarily “strategic behavior” in the usual sense of the term.

2. If p-values are what we have, it makes sense to learn what we can from them, as in the justly influential work of Uri Simonsohn, Greg Francis, and others. But, looking at the big picture, once we move to the goal of learning about underlying effects, I think we want to be analyzing raw data (and in the context of prior information), not merely pushing these p’s around. P-values are crude data summaries, and a lot of information can be lost by moving from raw data to p-values. Doing science using published p-values is like trying to paint a picture using salad tongs.

On deck this week

Mon: Constructing an informative prior using meta-analysis

Tues: Stan attribution

Wed: Cannabis/IQ follow-up: Same old story

Thurs: Defining conditional probability

Fri: In defense of endless arguments

Sat: Emails I never finished reading

Sun: BREAKING . . . Sepp Blatter accepted $2M payoff from Dennis Hastert

“Another bad chart for you to criticize”

Stuart Buck sends in this Onion-worthy delight:


Performing design calculations (type M and type S errors) on a routine basis?

Somebody writes writes:

I am conducting a survival analysis (median follow up ~10 years) of subjects who enrolled on a prospective, non-randomized clinical trial for newly diagnosed multiple myeloma. The data were originally collected for research purposes and specifically to determine PFS and OS of the investigational regimen versus historic controls. The trial has been closed to new enrollment for many years; however, we are monitoring for disease progression and all cause mortality.

Here is the crux of the issue. Although data were prospectively collected for research purposes, my investigational variable was collected but not reported as a variable. The results of the prospective trial (PFS and OS) have been previously published in Blood. I am updating the original report with the long-term follow up, but am also exploring the potential impact of my new variable on PFS and OS. I have not yet analyzed the data and do not know the potential impact, or magnitude of impact, on PFS or OS. If I am interpreting your paper correctly, I believe that I should treat the power calculation on a post-hoc basis and utilize Type S and Type M analysis.

I know this is brief, if you would offer a comment or a direction I would be deeply grateful. I am sure it is obvious that I don’t study statistics, I focus on the biology of multiple myeloma.

Fair enough. I’m no expert on myeloma. As a matter of fact, I don’t even know what myeloma is! (Yes, I could google it, but that would be cheating.) Based on the above paragraphs, I assume it is a blood-related disease.

Anyway, my response is, yes, I think it would be a good idea to do some design analysis, using your best scientific understanding to hypothesize an effect size and then going from there, to see what “statistical significance” really implies in such a case, given your sample size and error variance. The key is to hypothesize a reasonable effect size—don’t just use the point estimate from a recent study, as this can be contaminated by the statistical significance filter.

New paper on psychology replication


The Open Science Collaboration, a team led by psychology researcher Brian Nosek, organized the replication of 100 published psychology experiments. They report:

A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power to detect the original effect sizes.

“Despite” is a funny way to put it. Given the statistical significance filter, we’d expect published estimates to be overestimates. And then there’s the garden of forking paths, which just makes things more so. It would be meaningless to try to obtain a general value for the “Edlin factor” but it’s gotta be less than 1, so of course exact replications should produce weaker evidence than claimed from the original studies.

Things may change if and when it becomes standard to report Bayesian inferences with informative priors, but as long as researchers are reporting selected statistically-significant comparisons—and, no, I don’t think that’s about to change, even with the publication and publicity attached to this new paper—we can expect published estimates to be overestimates.

That said, even though these results are no surprise, I still think they’re valuable.

As I told Monya Baker in an interview for a news article, “this new work is different from many previous papers on replication (including my own) because the team actually replicated such a large swathe of experiments. In the past, some researchers dismissed indications of widespread problems because they involved small replication efforts or were based on statistical simulations. But they will have a harder time shrugging off the latest study, the value of this project is that hopefully people will be less confident about their claims.”

Nosek et al. provide some details in their abstract:

The mean effect size of the replication effects was half the magnitude of the mean effect size of the original effects, representing a substantial decline. Ninety-seven percent of original studies had significant results. Thirty-six percent of replications had significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects.

This is all fine, again the general results are no surprise but it’s good to see some hard numbers with real experiments. The only thing that bothers me in the above sentence is the phrase, “if no bias in original results is assumed . . .” Of course there is bias in the original results (see discussion above), so this just seems like a silly assumption to make. I think I know where the authors are coming from—they’re saying, even if there was no bias, there’d be problems—but really the no-bias assumption makes no sense given the statistical significance filter, so this seems unnecessary.

Anyway, great job! This was a big effort and it deserves all the publicity it’s getting.

Disclaimer: I am affiliated with the Open Science Collaboration. I’m on the email list, and at one point I was one of the zillion authors of the article. At some point I asked to be removed from the author list, as I felt I hadn’t done enough—I didn’t do any replication, nor did I do any data analysis, all I did was participate in some of the online discussions. But I do feel generally supportive of the project and am happy to be associated with it in whatever way that is.

A political sociological course on statistics for high school students

Ben Frisch writes:

I am designing a semester long non-AP Statistics course for high school juniors and seniors. I am wondering if you had some advice for the design of my class. My currentthinking for the design of the class includes:

0) Brief introduction to R/ R Studio and descriptive statistics and data sheet structure.

1) Great Migration in 20th Century US. Students will read sections of “The Warmth of Other Suns”. Each student will explore the size of the Great migration from the South in an industrial city of their choice. We will use the IPUMS micro census data to estimate white and black migration from Southern states and use the income figures to compare migrants and non migrant residents over the years 1910 – 1980. The old teaching software package Fathom used to do the sampling from IPUMS easily, but the Census sampling feature now no longer works with the newer operating systems. I will have the students sample directly from the University of Minnesota site and then decode their samples in excel and R Studio. A final part of the project will be visits with retired people who were a part of the migration.

2) I plan to have the students divide into working groups to prepare statistical information for lobbying elected officials on a social problem of their choice. We have access to the AFSC’s Criminal Justice program near at our school and immigration rights might fruitful topic to study after our examination of migration.

3) It will be primary season again next Spring and I would love to have the students look at geographical effects in political elections. We will, of course, study polling and survey design and explore sampling distributions.

I have just picked up copies of year texts “A Quantitative Tour…” and “Teaching Statistics…” and I plan to mine them for other activities to explore. I also will be catching up on reading your blog!

This sounds great! My only tip is to do as much of the data analysis yourself first so you can be sure your students can handle it. I did some ipums stuff recently and there were lots of little details with the data that were difficult to handle at first.

Perhaps readers of this blog will have other suggestions.

Vizzy vizzy vizzy viz


Nadia Hassan points me to this post by Matthew Yglesias, who writes:

Here’s a very cool data visualization from that took me a minute to figure out because it’s a little bit unorthodox. The way it works is that it visualizes the entire world’s economic output as a circle. That circle is then subdivided into a bunch of blobs representing the economy of each major country. And then each country-blob is sliced into three chunks — one for manufacturing, one for services, and one for agriculture.


What do I like about this image and what don’t I like?

Paradoxically, the best thing about this graph may also be its worst: Its tricky, puzzle-like characteristic (it even looks like some sort of hi-tech jigsaw puzzle) makes it hard to read, hard to follow, but at the same time gratifying for the reader who goes to the trouble of figuring it out.

It’s the Chris Rock effect: Some graphs give the pleasant feature of visualizing things we already knew, shown so well that we get a shock of recognition, the joy of relearning what we already know, but seeing it in a new way that makes us think more deeply about all sorts of related topics.

As a statistician, I can tell you a whole heap of things I don’t like about this graph, starting with the general disorganization—there’s no particular way to find any country you might be looking for, and there seems to be no logic to the spatial positions—I have no idea what Austrlia is doing in the middle of the circle, or why South Korea and Switzerland are long and thin while Mexico and India are more circular. The breakdown of economy into services/industry/agriculture is particularly confusing because of all the different shapes, and for heaven’s sake, why are the numbers given to a hyper-precise two decimal places?? (You might wonder what it means to say that Russia is “2.49%” of the world economy, given that, last time I checked, readily-available estimates of Russia’s GDP per capita varied my more than a factor of five!)

Yglesias’s post is headlined, “This striking diagram will change how you look at the world economy,” and I can believe it will change people’s understanding, not because the data are presented clearly of because the relevant comparisons are easily available, but because the display is unusual enough that it might motivate people to stare at these numbers that they otherwise might ignore.

Some of the problems with this graph can be seen by carefully considering this note from Yglesias:

You can see some cool things here.

For example, compare the US and China. Our economy is much larger than theirs, but our industrial sectors are comparable in size, and China’s agriculture sector looks to be a little bit larger. Services are what drive the entire gap.

The UK and France have similarly sized overall economies, but agriculture is a much bigger slice of the French pie.

For all that Russia gets played up as some kind of global menace, its economy produces less than Italy. Put all the different European countries together, and Russia looks pathetic.

You often hear the phrase “China and India,” but you can see here that the two Asian giants are in very different shape economically.

The only African nation on this list, South Africa, has a smaller economy than Colombia.

What struck me about all these items is how difficult it actually is to find them in the graph. Comparing the U.S. with China on their industry sector, that’s tough: you have to figure out which color is which—it’s particularly confusing here because the color codes for the two countries are different—and then compare two quite different shapes, a task that would make Jean Piaget flip out. The U.K. and France can be compared without too much difficulty but only because they happen to be next to each other, through some quirk of the algorithm. Comparing China and India is not so easy—it took me awhile to find India on this picture. And finding South Africa was even trickier.

My point is not that the graph is “bad”—I’d say it’s excellent for its #1 purpose which is to draw attention to these numbers. It’s just an instructive example for what one might want in a data display.

The click-through solution

As always, I recommend what I call the “click-through solution”: Start with a visually grabby graphic like this one, something that takes advantage of the Chris Rock effect to suck the viewer in. Then click and get a suite of statistical graphs that allow more direct visual comparisons of the different countries and different sectors of the economy. Then click again to get a spreadsheet with all the numbers and a list of sources.

Stan’s 3rd birthday!

Stan v1.0.0 was released on August 30, 2012. We’ve come a long way since.

If you’re around and want to celebrate with some Stan developers and users, feel free to join us:

Monday, August 31.
6 – 9 pm
Untamed Sandwiches
43 W 39th St
New York, NY

If you didn’t know, we also have a Stan Users NYC group that meets every few months.

Thanks and hope to see some of you there.

“Can you change your Bayesian prior?”

Deborah Mayo writes:

I’m very curious as to how you would answer this for subjective Bayesians, at least. I found this section of my book showed various positions, not in agreement.

I responded on her blog:

As we discuss in BDA and elsewhere, one can think of one’s statistical model, at any point in time, as a placeholder, an approximation or compromise given constraints of computation and of expressing one’s model. In many settings the following iterative procedure makes sense:

1. Set up a placeholder model (that is, whatever statistical model you might fit).

2. Perform inference (no problem, now that we have Stan!).

3. Look at the posterior inferences. If some of the inferences don’t “make sense,” this implies that you have additional information that has not been incorporated into the model. Improve the model and return to step 1.

If you look carefully you’ll see I said nothing about “prior,” just “model.” So my answer to your question is: Yes, you can change your statistical model. Nothing special about the “prior.” You can change your “likelihood” too.

And Mayo responded:

Thanks. But surely you think it’s problematic for a subjective Bayesian who purports to be coherent?

I wrote back: No, subjective Bayesianism is inherently incoherent. As I’ve written, if you could in general express your knowledge in a subjective prior, you wouldn’t need Bayesian Data Analysis or Stan or anything else: you could just look at your data and write your subjective posterior distribution. The prior and the data models are just models, they’re not in practice correct or complete.

More here on noninformative priors.

And here’s an example of the difficulty of throwing around ideas like “prior probability” without fully thinking them through.

“The belief was so strong that it trumped the evidence before them.”

I was reading Palko on the 5 cent cup of coffee and spotted this:

We’ve previously talked about bloggers trying to live on a food stamp budget for a week (yeah, that’s a thing). One of the many odd recurring elements of these post is a litany of complaints about life without caffeine because

I had already understood that coffee, pistachios and granola, staples in my normal diet, would easily blow the weekly budget.

Which is really weird because coffee isn’t all that expensive.

Palko then goes into detail about how easy it is to buy a can of ground coffee at the supermarket for the cost of 5 or 10 cents a cup.

He continues:

On the other end, if you go to $0.15 or $0.20 a cup and you know how to shop, you can move up into some surprisingly high-quality whole bean coffee . . . you can do better than the typical cup of diner coffee for a dime and better than what you’d get from most coffee houses for a quarter.

To be clear, I’m not recommending that everyone rush out to Wal-Mart for a big ol’ barrel of Great Value Classic Roast. If your weekly food budget is more than fifty dollars a week, bargain coffee should be near the bottom of your concerns.

But here’s the important point—that is, important in general, not just for coffee drinkers (of which I am not one):

What we’re interested in here are perceptions. The people we discussed earlier suffered through a week of headaches and other caffeine-withdrawal pains, not because they couldn’t afford it but because the belief that they couldn’t afford it was so strong that it trumped the evidence before them.

This comes up a lot. People condition on information that isn’t true.

On deck this week

Mon: “The belief was so strong that it trumped the evidence before them.”

Tues: “Can you change your Bayesian prior?”

Wed: How to analyze hierarchical survey data with post-stratification?

Thurs: A political sociological course on statistics for high school students

Fri: Questions about data transplanted in kidney study

Sat: Performing design calculations (type M and type S errors) on a routine basis?

Sun: “Another bad chart for you to criticize”

We provide a service

A friend writes:

I got the attached solicitation [see below], and Google found me your blog post on the topic. Thank you for quickly explaining what’s going on here!

As far as I can see, they’ve removed the mention of payment from this first contact message – so they’re learning!

But also they have enough past clients to be able to include some nice clips. Ah, the pathological results of making academics feel obliged to self-promote.

This time the email didn’t come from “Nick Bagnall,” it came from “Josh Carpanini.” Still spam. But, as I wrote last time, it’s better than mugging old ladies for spare change or selling Herbalife dealerships.

P.S. Here’s the solicitation:

From: Josh Carpanini
Date: Friday, June 5, 2015
Subject: International Innovation – Highlighting Impacts of Technology Research

Dear Dr **,

I hope this message finds you well.

I was hoping to speak with you at some point in the next few days about an upcoming Technology edition of International Innovation. I have come across some of your research and I am very interested to discuss with you the possibility of highlighting your work within the forthcoming July edition.

I would like to create an article about your work within our next edition; this would be similar in format to some of the attached example articles from previous editions. As you can see, the end result would be a piece looking at the wider implications and impact of your current research. . . .

Plaig! (non-Wegman edition)

Mark Vallen writes (link from here):

What initially disturbed me about the art of Shepard Fairey is that it displays none of the line, modeling and other idiosyncrasies that reveal an artist’s unique personal style. His imagery appears as though it’s xeroxed or run through some computer graphics program; that is to say, it is machine art that any second-rate art student could produce. . . .

Fairey’s Greetings from Iraq is not a direct scan or tracing of the FAP print, but it does indicate an over reliance on borrowing the design work of others. There was no political point or ironic statement to be made by expropriating the FAP print – it was simply the act of an artist too lazy to come up with an original artwork. . . .

Some supporters of Shepard Fairey like to toss around a long misunderstand quote by Pablo Picasso, “Good artists copy, great artists steal.” Aside from the ridiculous comparison of Fairey to Picasso, there’s little doubt that Picasso was referring to the “stealing” of aesthetic flourishes and stylings practiced by master artists, and not simply carting off their works and putting his signature to them.

A last ditch defense used by Fairey groupies is to acknowledge that their champion does indeed “borrow” the works of other artists both living and deceased, but it is argued that the plundered works are all in the “public domain”, and therefore the rights of artists have not been violated. There are those who say that artists should have the right to alter and otherwise modify already existing works in order to produce new ones or to make pertinent statements. Despite some reservations I generally agree with that viewpoint – provided that such a process is completely transparent. . . .

I’m reminded of George Orwell’s classic slam on lazy and dishonest writing:

Each of these passages has faults of its own, but, quite apart from avoidable ugliness, two qualities are common to all of them. The first is staleness of imagery; the other is lack of precision. The writer either has a meaning and cannot express it, or he inadvertently says something else, or he is almost indifferent as to whether his words mean anything or not. This mixture of vagueness and sheer incompetence . . .

Laziness and dishonesty go together, and that fits the stories of Shepard Fairey and Ed Wegman as well. You copy from someone else, and you have nothing of your own to add, so you hide your sources, and this sends you into a sort of spiral of lies. In which case, why do any work at all? In Fairey’s case, the work is all about promotion, not about the art itself. In Wegman’s case, the work all goes into lawsuits and backroom maneuvering, not into the statistics.

Once you’re hiding your sources, you might as well cut corners on the product, eh?

That was easy

This came in the email from Tom Kertscher:

Are you available this afternoon or Wednesday to talk about a fact-check article I’m doing on Gov. Scott Walker’s statement that Wisconsin is a “blue” state?

I’m aware, of course, that Wisconsin has voted for the Democratic presidential nominee in each election since 1988.

But I’d like to talk about whether there are other common ways that states are labeled as red or blue (or perhaps purple).

Tues and Wed have already passed, so it’s probably too late, but here’s my response: I would call Wisconsin a 50-50 or “purple” state, in that its vote split has been very close to the national average in recent presidential elections.

Aahhhhh, young people!

Amusingly statistically illiterate headline from Slate: “Apple Notices That Basically Half the Population Menstruates.”

Ummmm, let’s do a quick calculation: 50 – 12 = 38. If you assume the average woman lives to be 80, then the proportion of the population who is menstruating is approximately .52*38/80 = .247.

25% is hardly “basically half”!

But if you’re a young adult, I guess you don’t think so much about people who are under 12 or over 50.

I was similarly amused by the mistake of Beall and Tracy, authors of that now-famous ovulation-and-clothing study, who thought that peak fertility started 6 days after menstruation. If you’re young, you’ve probably been reminded by sex-ed classes that you can get pregnant at any time. It’s only when you get older that you learn about which are the most important days if you’re trying to get pregnant.

Data-analysis assignments for BDA class?

In my Bayesian data analysis class this fall, I’m planning on doing some lecturing and class discussion, but the core of the course will be weekly data-analysis assignments where they do applied statistics using Stan (to fit models) and R (to pre-process the data and post-process the inferences).

So, I need a bunch of examples. I’d appreciate your suggestions. Here’s what I’ve got so far:

Classic examples:

8 schools
Arsenic in Bangladesh

Modern classics:

World Cup
Speed dating
Hot hand
Gay rights opinions by age
The effects of early childhood intervention in Jamaica

I’m also not clear on how to set things up: Do I just throw them example after example and have them try their best, or do I start with simple one- and two-parameter models and then go from there?

One idea is to go on two parallel tracks, with open-ended real-data examples that follow no particular order, and fake-data, confidence-building examples that go through the chapters in the book.

Anyway, any suggestions of yours would be appreciated. Thanks.

“Soylent 1.5” < black beans and yoghurt


Mark Palko quotes Justin Fox:

On Monday, software engineer Rob Rhinehart published an account of his new life without alternating electrical current — which he has undertaken because generating that current “produces 32 percent of all greenhouse gases, more than any other economic sector.” Connection to the power grid isn’t all Rhinehart has given up. He also doesn’t drive, wash his clothes (or hire anyone else to wash them) or cook anything but coffee and tea. But he still lives in a big city (Los Angeles) and is chief executive officer of a corporation with $21.5 million in venture capital funding.

That corporation is Rosa Labs, the maker of Soylent, a “macronutritious food beverage” designed to free its buyers from the drudgery of shopping, cooking and chewing. In the 2,900-word post on his personal blog, Rhinehart worked in an extended testimonial for Soylent 2.0, a new, improved version of the drink — algae and soy seem to be the two most important ingredients — that will begin shipping in October.

Fox’s piece is headlined, “Soylent Is Weird, But It’s Good Weird.”

But is it really “good weird”? Or, if so, what kind of “good” is it?

According to Palko, Soylent is not so nutritious.

Here’s the comparison of 115 grams of Solyent:

Screen Shot 2015-08-11 at 11.32.10 AM

to comparable servings of black beans:


and nonfat Greek yoghurt:

Nutrition Facts Plain 5.3oz

And I think it’s safe to say it’s not so delicious.

Nor is it so amazingly convenient. Palko writes:

Nor do you have to cook to do better than Soylent. I did a quick check at the grocery store last night and I found lots of frozen entrees that gave you more nutrition for less calories than Rosa Lab’s product.

In summary:

Basically, when you cut through all of the pseudo science and buzzwords and LOOKATME antics, Rhinehart is simply peddling a mediocre protein shake with the same tired miracle food claims that marketers have been using since John Harvey Kellogg gave C.W. Post his first enema.

The paradox . . . or is it?

At first this seems like a paradox . . . Silicon Valley genius, $21 million in venture capital funding . . . how could it be just a scam?

But then you realize that nutrition has nothing to do with it (other than as a marketing concept).

Recall that the goal of the people who invested 21 million dollars in this product, is not to give people healthy and satisfying meals, it’s to have the image of something healthy and satisfying.

Is Soylent a scam? Yes and no. It’s a scam to the people who are being sold the product, but maybe not to the investors.

Perhaps the whole Silicon Valley thing is a distraction, and the right analogy is to something like the movie Battleship, which was universally agreed to be crap but still sold jillions of dollars worth of tickets.

So, when business writer Justin Fox writes that Soylent is “good” and that it is “an interesting product,” this would be like a movie reviewer saying that Battleship is a good movie. It was good to its investors, I assume!

And for a business writer to credulously take Rinehart’s word on the health benefits of “macronutrient balance” and “glycemic index” of the products he’s selling, without just going to the supermarket and comparing to the label on a can of black beans and a tub of yoghurt.

But is Solyent a good model for a business? I guess that depends on whether potential consumers view it as a sugary, fatty, bad-tasting alternative to beans and yoghurt; or as a healthy processed-food alternative to a breakfast of cornflakes and Coca-cola.

And that in turn must depend on part on press coverage. As Palko has written elsewhere on his blog, A Statistician Walks into a Grocery Store, journalists typically don’t seem to have a good framework for writing about food and nutrition, especially when it comes to low budgets.

So, in that sense, the credulous news reports on Soylent (and it’s not just Justin Fox; see, for example, this gee-whiz article by Lizzie Widdicombe in the New Yorker, subtitled, “Has a tech entrepreneur come up with a product to replace our meals?”) are just part of the larger picture.

Food and nutrition reporting have little context. Imagine if entertainment reporting were the same way:

Battleship: The Hamlet for the 21st Century

Daniel on Stan at the NYC Machine Learning Meetup

I (Daniel) will be giving a Stan overview talk on Thursday, August 20, 7 pm.

Bob gave a talk there 3.5 years ago. My talk will be light and include where we’ve been and where we’re going.


P.S. If you make it, find me. I have Stan stickers to give out.

P.P.S. Stan is on twitter.

Stan sticker

Macartan Humphreys on the Worm Wars


My Columbia political science colleague shares “What Has Been Learned from the Deworming Replications: A Nonpartisan View”:

Last month there was another battle in a dispute between economists and epidemiologists over the merits of mass deworming.1 In brief, economists claim there is clear evidence that cheap deworming interventions have large effects on welfare via increased education and ultimately job opportunities. It’s a best buy development intervention. Epidemiologists claim that although worms are widespread and can cause illnesses sometimes, the evidence of important links to health is weak and knock-on effects of deworming to education seem implausible. . . .

So. Deworming: good for educational outcomes or not?

You’ll have to click through to read the details, but here’s Macartan’s quick summary:

The conclusions that I take away though are that (a) the magnitude and significance of spillover effects are in doubt because of the measurement issues and the inference issues; (b) the inferences on the main effects are also in doubt because of the problems with identification and explanation. Neither of the main claims is demonstrably incorrect, but there are good grounds to doubt both of them.

What about policy? Macartan continues:

A number of commentators have argued that the policy implications are more or less unchanged. This includes organizations that focus specifically on the evidence base for policy (such as CGD and GiveWell).

Perhaps the most important point of confusion is what policy conclusions this discussion could affect. Many are defending deworming for non-educational reasons. But the discussion of the MK [Miguel and Kremer] paper really only matters for the education motivation. And perhaps primarily for the short-term school attendance motivation. Like much other literature in this area it finds only weak evidence for direct health benefits (beyond the strong evidence for the removal of worms). It also does not claim to find evidence on actual performance. Although many groups endorse deworming for health reasons, and rank it as a top priority, this, curiously, goes against the weight of evidence as summarized in the Cochrane reports at least. If the consensus for deworming for health reasons still stands it is not because of this paper.

Does the challenge to this paper weaken the case for deworming for educational reasons? I find it hard to see how it cannot.

I have a few comments of my own, not on deworming—I know nothing about that—but on some of the statistical points raised by Macartan’s post.

– The 800-pound gorilla in the room is opportunity cost, or cost-benefit analysis. As you say, who could be against de-worming kids? I’m reminded of Jeff Sachs’s argument that all of these sorts of interventions are worth doing, and that rather than trying so hard to rank the cost-effectiveness of different health and economic interventions, the rich countries should just kick in that 1% of GDP or whatever and do all of them. I’m not saying Sachs is necessarily right on this, I’m just saying that most of the discussion seems to be on traditional statistical grounds (Is there an effect? Is it statistically significant? Has it been proven beyond a reasonable doubt?) and the cost-benefit or opportunity cost calculations are implicit. Once or twice, cost-benefit calculations do get done, but not in a serious way. For example, Macartan points to a “60 to 1” benefit-to-cost ratio for deworming claimed by the Copenhagen Consensus, but apparently those guys just took the point estimate of effectiveness (which is a biased estimate, possibly hugely biased; see more on this below) and ran with it.

– Macartan talks about multiple comparisons, which is fine (though I’d prefer hierarchical modeling rather than classical corrections; see here and here. Macartan mentions the statistical significance filter: Statistically significant estimates tend to overestimate the magnitude of true effects (we call it the type M error or exaggeration factor here). This can be a big deal, especially once things get to the decision stage.

– Macartan mentions development economist Paul Gertler. I’ve only encountered his work once, and it was a case where he hyped and exaggerated (unintentionally, I’m sure) an effect size. I contacted him about it and asked him if he was concerned about the statistical significance filter, and he did not reply. Apparently he was happy reporting an overestimate. It was an early-childhood intervention experiment in Jamaica. Again, who could object to helping poor kids?

– I share Macartan’s skepticism about the spillovers. One problem here is that researchers have an incentive to make a “discovery.” De-worming helps kids, ok, that’s fine. But a spillover effect, that’s news. But the paradox is that these surprising findings are more subject to the statistical significance filter. The headline clams can be the biggest overestimates. And this is completely consistent with the calculation in section 3.4.1 of Macartan’s report. It is similar to the calculation that Eric Loken and I did regarding the notorious claim that women in a certain part of their monthly cycle were more likely to wear red. The researchers were proud of making this discovery with such a noisy measuring instrument, but if you back out how large the effect would’ve had to be, for the claimed effect to show in the population, it would have to be unrealistically huge. And of course this happened with that horrible LaCour study—the claimed effects in the aggregate implied huge effects in the subgroup of the population who would’ve been affected by the treatment.

– I don’t like Macartan’s section 4.2, “Can we be a bit more Bayesian?” I guess I’d like him to be a bit more Bayesian. In particular, I really don’t like the sort of binary thinking in which deworming works or doesn’t work for some purpose. To me, the concern is not that deworming or whatever is a “dud” but rather that it is not as effective as the published record might suggest. For a Bayesian decision analysis I’d prefer to do it straight, with costs, benefits, and a continuous parameter that represents the effectiveness of the treatment. Even setting the decision analysis aside, you can do Bayesian inference: just say there’s a true (population, average) casual effect and that you have a prior for it. Then it’s simple inference, an inverse-variance weighted average of the data and the prior information, no need for tricky probability formulas.

Finally, I appreciate the way that, in his report, Macartan moves back and forth between the details and the big questions. These connections are a key part of any methodological analysis.