Skip to content
 

How to read (in quantitative social science). And by implication, how to write.

Question_mark

It all started when I was reading Chris Blattman’s blog and noticed this:

One of the most provocative and interesting field experiments I [Blattman] have seen in this year:

Poor people often do not make investments, even when returns are high. One possible explanation is that they have low aspirations and form mental models of their future opportunities which ignore some options for investment.

This paper reports on a field experiment to test this hypothesis in rural Ethiopia. Individuals were randomly invited to watch documentaries about people from similar communities who had succeeded in agriculture or business, without help from government or NGOs. A placebo group watched an Ethiopian entertainment programme and a control group were simply surveyed.

. . . Six months after screening, aspirations had improved among treated individuals and did not change in the placebo or control groups. Treatment effects were larger for those with higher pre-treatment aspirations. We also find treatment effects on savings, use of credit, children’s school enrolment and spending on children’s schooling, suggesting that changes in aspirations can translate into changes in a range of forward-looking behaviours.

What was my reaction? When I saw Chris describe this as “provocative and interesting,” my first thought was—hey, this could be important! I have a lot of respect for Chris Blattman, both regarding his general judgment and his expertise more particularly in research on international development.

My immediate next reaction was a generalized skepticism, the sort of thing I feel when encountering any sort of claim in social science. I read the above paragraphs with a somewhat critical eye and noticed some issues: potential multiple comparisons (“forking paths”) and comparisons between significant and non-significant, also possible issues with “story time.” So now I wanted to see more.

Blattman’s post links to an article, “The Future in Mind: Aspirations and Forward-Looking Behaviour in Rural Ethiopia,” by Bernard Tanguy, Stefan Dercon, Kate Orkin, and Alemayehu Seyoum Taffesse. Here’s the final sentence of the abstract:

The result that a one-hour documentary shown six months earlier induces actual behavioural change suggests a challenging, promising avenue for further research and poverty-related interventions.

OK, maybe. But now I’m really getting skeptical. How much effect can we really expect to get from a one-hour movie? And now I’m looking more carefully at what Chris wrote: “provocative and interesting.” Hmmm . . . Chris doesn’t actually say he believes it!

Now it’s time to read the Tanguy et al. article. Unfortunately the link only gives the abstract, with no pointer to the actual paper that I can see. So I google the title, *The Future in Mind: Aspirations and Forward-Looking Behaviour in Rural Ethiopia*, and it works! the first link is this pdf, it’s a version of the paper from April 2014 but that should be good enough.

How to read a research paper

But now the real work begins. I go into the paper and look for their comparisons: treatment group minus control group, controlling for pre-treatment information. Where to look? I cruise over to the Results section, that would be section 4.1, “Empirical strategy: direct effects,” which begins, “We first examine direct effects on individuals from the experiment.” It looks like I’m interested in model (4.3), and it appears that the results appear in table 6 through 12. And here’s the real punchline:

Overall, despite a relatively soft intervention – a one-hour documentary screening – we find clear evidence of behavioural changes six months after treatment. These results are also in line with our analysis of which components of the aspirations index are affected by treatment.

OK, so let’s take a look at tables 6-12. We’ll start with table 6:

Screen Shot 2014-11-30 at 12.08.41 PM

I’ll focus on the third and sixth columns of numbers, as this is where they are controlling for pre-treatment predictors. And for now I’ll look separately at outcomes straight after screening and after six months. And it looks like I’m suppose to take the difference between treatment and placebo groups. But then there’s a problem: of the four results presented (aspirations and expectations, immediate and after 6 months), only one is statistically significant, and that only at p=.05. So now I’m wondering whassup.

Table 7 considers the participants’ assessment of the films. I don’t care so much about this but I’ll take a quick look:

Screen Shot 2014-11-30 at 12.20.24 PM

Huh? Given the sizes of the standard errors, I don’t understand how these comparisons can be statistically significant. Maybe there was some transcription error? 0.201 should’ve been 0.0201, etc?

Tables 8 and 10, nothing’s statistically significant. This of course does not mean that nothing’s there, it just tells us that the noise is large compared to any signal. No surprise, perhaps, as there’s lots of variation in these survey responses.

Table 9, I’ll ignore, as it’s oriented 90 degrees off and it’s hard to read, also it’s a bunch of estimates of interactions. And given that I don’t really see much going on in the main effects, it’s hard for me to believe there will be much evidence for interactions.

Table 11 is also rotated 90 degrees, also it’s about a “hypothetical demand for credit.” Could be important but I’m not gonna knock myself out trying to read a bunch of tiny numbers (868.15, 1245.80, etc.) Quick scan: three comparisons, one is statistically significant.

And Table 12, nothing statistically significant here either.

At this point I’m desperate for a graph but there’s not much here to quench my thirst in that regard. Just a few cumulative distributions of some survey responses at baseline. Nothing wrong with that but it doesn’t really address the main questions.

So where are we? I just don’t see the evidence for the big claims, actually I don’t even see the evidence for the little claims in the paper. Again, I’m not saying the claims are wrong or even that they have not been demonstrated, I just couldn’t find the relevant information in a quick read.

How to write a research paper

Now let’s flip it around. Given my thought process as described above, how would you write an article so I could more directly get to the point?

You’d want to focus on the path leading from your data and assumptions to your key empirical claims. What would really help would be a graph—“Figure 1” of the paper, or possibly “Figure 2” showing the data and the fitted model, maybe it would be a scatterplot where each dot represents a person, with two different colors representing treated and control groups, plotting outcome vs. a pre-treatment summary, with fitted regression lines overlain.

It shouldn’t take forensics to find the basis for the article’s key claim. And the claims themselves should be presented crisply.

Consider two approaches to writing an article. Both are legitimate:
1. There is a single key finding, a headline result, with everything else being a modification or elaboration of it.
2. There are many little findings, we’re seeing a broad spectrum of results.

Either of these can work, indeed my collaborators and I have published papers of both types.

But I think it’s a good idea to make it clear, right away, where your paper is heading. If it’s the first sort of paper, please state clearly what is the key finding and what is the evidence for it. If it’s the second sort of paper, I’d suggest laying out all the results (positive and negative) in some sort of grid so they can all be visible at once. Otherwise, as a reader, I struggle through the exposition, trying to figure out which results are the most important and what to focus on.

That sort of organization can help the reader and is also relevant when considering questions of multiple comparisons.

Beyond this, it would be helpful to make it clear what you don’t yet know. Not just: The comparison is statistically significant in setting A but not in setting B (or “aspirations had improved among treated individuals and did not change in the placebo or control groups”), but a more direct statement about where are the key remaining uncertainties.

In using the Tanguy et al. paper as an opening to talk about how to read and write research articles, I’m not at all trying to say that it’s a particularly bad example; it’s just an example that was at hand. And, in any case, the authors’ primary goal is not to communicate to me. If their style satisfies their aim of communicating to economists and development specialists, that’s what’s most important. They, and other readers, will I hope take my advice here in more general terms, as serving the goals of statistical communication.

My role in all this

A couple months ago I got into a dispute with political scientist Larry Bartels, who expressed annoyance that I expressed skepticism about a claim he’d made (“Fleeting exposure to ‘irrelevant stimuli’ powerfully shapes our assessments of policy arguments”), without having fully read the research reports upon which his claim was based. In my response, I argued that it was fully appropriate for me to express skepticism based on partial information; or, to put it another way, that my skepticism based on partial information was as valid as his dramatic positive statements (“Here’s how a cartoon smiley face punched a big hole in democratic theory”) which themselves were only based on partial information.

That said, Bartels had a point, which is that a casual reader of a blog post might just take away the skepticism without the nuance. So let me repeat that I have not investigated this Tanguy et al. article in detail, indeed the comments above represent my entire experience of it.

To put it another way, the purpose of this post is not to present a careful investigation into claims about the effect of watching a movie about rural economic development; rather, this is all about the experience of reading a research article and, by implications, suggestions of how to write such an article to make it more accessible to critical readers.

In the meantime, if any reader wants to supply further information to clarify this particular example, feel free. If there’s something important that I’ve missed, I’d like to know; also if anything it would make my argument even stronger, buy demonstrating the difficulties I’ve had in reading a research paper.

P.S. From a few years back, here’s some other advice on writing research articles.

18 Comments

  1. PBaylis says:

    Having also not read the paper, I wonder if the values in parentheses in Table 7 are sample standard deviations rather than estimated standard errors?

    • Andrew says:

      Pbaylis:

      I was wondering that too. But the note at the bottom of Figure 7 explicitly says, “Robust standard errors clustered at household level in parentheses.” So I assumed that they’re standard errors. But I guess you’re right, it could’ve been an error in the compilation of the table, or it could be that the person who wrote the note at the bottom of the table is not the same as the person who computed the numbers.

      • Elin says:

        Maybe they used non-robust standard errors to calculate the p values?

        • Andrew says:

          Elin:

          No, I think Pbaylis and Russ are correct, that someone simply computed sqrt(p*(1-p)) and then put this in parentheses, mistakenly thinking it was the standard error. Just the old, old story of someone using the wrong formula. I am guessing that person A wrote the caption and person B filled in the table, and there was a lack of communication. Given that I was not able to figure out what was going on in the paper, perhaps it’s no surprise that the authors had some confusion too.

  2. Fernando says:

    Andrew:

    I think your approach is entirely appropriate. Indeed, the goal should be to write the paper so a skeptical reader can find convincing answers quickly. Of course, this is easier said than done. Simplicity and clarity are fine arts.

    One approach is to start with the findings, in Figure 1 Section 1, and then explain how we got there. Unfortunately, and I am guilty as charged, most papers start with the road to the finding; postpone the finding to Section 4 or 5; and bury it in a whole bunch of tables to satisfy reviewers.

    So in terms of the writing advice you mention in your P.S., I would say that instead of starting with the conclusions and then working back, I would start with Section 1 on findings and then work forward. Same thing really, but more logical if the goal is to convince skeptics on a quick read. That is, my proposed structure is:

    1. Intro (Exec summary: Research question, why important, what we find, how we find it)
    2. Findings
    3. Materials and methods
    4. Threats to inference / ancillary analyses (some in annex)
    5. Conclusion (How findings advance what we think we know, with reference to literature).

    The current standard:

    1. Intro
    2. Lit review
    3. Methods
    4. Findings
    5. Discussion
    6. Conclusion

  3. jrc says:

    Andrew,

    I agree that the evidence here is only suggestive* and that the idea that a 1 hour documentary would change long-term behavior in majorly important ways is something of a stretch (although – Kony, right?, which at least changed some sorority girls’ Facebook behavior for a while).

    That said, if I were to guess as to why people in my field think this is so important**: We know that peer effects matter and we know that adult earnings and occupational/educational choices are partially (maybe even strongly) determined by our parents. And I think some of us have thought, abstractly and informally, that there is some element of a “choice set” at work in people’s decisions about their profession. And part of that choice set is just the set of options you can conceive of.

    An example from my life: I was working a job I hated. I started asking everyone I knew if they knew of any job openings. One colleague says to me “I work at the soccer channel on the weekend.” To which I said “There is a soccer channel?” It had never in my life occurred to me that there are hundreds of people who sit in front of machines and press buttons to make soccer show on your TV screen, and that much of this work involves…watching soccer! The choice of being “TV broadcast engineer for soccer channel” was not in my choice set before I talked to that guy, and so I couldn’t aspire to do that (but then I did start aspiring, and then I did that job for a few years, and it was awesome).

    So I would guess that, wandering around rural Bangladesh or Guatemala or wherever, we’ve all looked at kids playing with a tire in the street and thought “these kids don’t even know that there is a world out there in which they could make all these other choices about their future”. We just weren’t imaginative enough to come up with ways of testing the theory that weren’t, say, mess with 20 undergraduates in a lab**. And this seems like a totally reasonable first-pass attempt at getting at this feeling that a “potential career choice set” is a big obstacle faced by poor children in rural areas***.

    *I think the p-values (aside from all epistemological p-value criticism on this blog) are too small – the standard would be to cluster SEs at the level of randomization, but there are too few clusters, so they cluster at the household level which is likely insufficiently flexible/robust. I would, were I to referee this, suggest both wild cluster-t bootstraps and (forgive me) randomization tests for these p-values.

    **Maybe this is the Development equivalent of messing with 20 (ok, 18!) undergrads, but as a kind of pilot for an idea, I might be OK with that. Plus they would be 18 low-variance subjects (because each “subject” is a whole village of people).

    ***I actually have no idea whether lots of other people have had this thought, but it seems like a natural reaction, in that it happens to me all the time, which is obviously a totally objective measure of “natural”. It also feels “in the air” to me, given our recent interest in decision making behavior and subjective valuation, etc. – things where people’s subjective beliefs play a prominent role. Merge that with choice theory and you’ve got, well, a “future things I could conceive of doing with my life choice set”.

    • Steve Sailer says:

      “this feeling that a “potential career choice set” is a big obstacle faced by poor children in rural areas”

      Yes, I think this feeling is quite plausible.

      For example, when I was graduating from college in 1980 and interviewing for a job, I didn’t lack for ambition or self-confidence, like a lot of poor peasants suffer from. Still, I had a hard time being convincing in interviews because I didn’t really have a picture in my head of what you did all day in any particular job, and I’m not good at faking things.

      So, I went to MBA school, which is kind of a Let’s Pretend experience for trying out different careers, and by 1982 I was far better in interviews because I had done the kind of projects you would have to do in the various jobs I interviewed for. (Unfortunately, they don’t let 21-year-olds go to B-School anymore).

      Whether watching a 1 hour documentary is enough is a different question, however. But in theory I think it’s extremely plausible that your society gives you signals about what kinds of careers are plausible. For example, I witnessed attitudes about women’s careers change with extreme rapidity from, say, 1969-1974. (Not much has changed in the 40 years since then, of course.)

  4. Richard Scott says:

    Good article, but I agree entirely with Fernando’s comment. As a busy policy maker looking for evidence base in research, I want to know what the findings were (or weren’t) immediately. If that suits my need, then I’ll look to the methodology to satisfy myself of the reliability of the evidence. That ‘reverse pyramid’ approach is how a reporter would write up the research.

    • Andrew says:

      Richard:

      I think what you and Fernando call “findings” is what I am calling “conclusions.” And what you are calling “conclusions” is what I would call “speculations.” That is, I think we’re in agreement: the author should put his or her findings up front, then give the evidence for the claims, etc. One of my difficulties with the paper under discussion above is that (a) it’s not quite clear what they claim are their key findings, and (b) I can’t really see all their findings in one place.

  5. D.O. says:

    IMHO the paper should be written with a mind toward a reader who wants to read it slowly and take in as much detail as possible. It is true that many (majority?) of readers don’t care about slow reading and want results first. But they will read the paper in any odd order they want anyways, looking for supporting information and explanations wherever needed. No point in trying to present information up front for such readers. If sec. 5 is where the results are, a willing reader will turn to section 5 ignoring previous 4. Some years ago I was taught that a typical experienced reader reads in the order abstract-conclusion-results, with a possibility of lost interest at any stage. If you clearly mark this sections, it does not matter in which order they are put.

  6. Without looking at the paper but only your description, I guess that the numbers in parentheses in Table 7, which you assumed were SEs, are merely \sqrt {p(1-p)}, where p is the number not in parentheses. This is pretty common to include, even though it does not add any new information for a table of percentages. I know one author who recommends against such decorations.

  7. I agree with most of what’s been said about the organization of the paper. And the results (after six months) are on their face very weak, hard to see why the authors (much less Blattman) would crow about them.

    But, beyond that, the design of the experiment asks us to swallow a lot: the authors provide a nice review of evidence that aspirations would affect behavior (pp. 4-6), and let’s say that’s true in general. But then their intervention (an inspirational film) could just as well (i) introduce an error into the measurement of aspiration and/or (ii) change the relationship between aspiration and behavior for the treated individuals (I think Leamer would call these additive and multiplicative confounders, respectively).

    To put this another way, jrc testifies to a personal case where a small bit of information changed both aspiration and behavior, which is in keeping with the Bernard et al story; on the other hand, I can come out of a movie with my aspiration to be James Bond or Gandalf greatly enhanced and though I might, heaven forfend, reveal this to somebody (even to a researcher) it is unlikely to change my behavior. Call these the motivational effect and the Walter Mitty effect. I don’t see that Bernard et al have any way of distinguishing between them because (in common with so many psychological researchers) they aren’t measuring behavior.

  8. […] 02 – How to read (in quantitative social science). And by implication, how to write. by Andrew It all started when I was reading Chris Blattman’s blog and noticed this: One of the […]

  9. […] few months ago I posted on a paper by Bernard Tanguy et al. on a field experiment in Ethiopia where I couldn’t figure out, from […]

Leave a Reply