How to read (in quantitative social science). And by implication, how to write.

I happened to come across this classic from 2014. For convenience I’ll just repeat it all here:

Question_mark

It all started when I was reading Chris Blattman’s blog and noticed this:

One of the most provocative and interesting field experiments I [Blattman] have seen in this year:

Poor people often do not make investments, even when returns are high. One possible explanation is that they have low aspirations and form mental models of their future opportunities which ignore some options for investment.

This paper reports on a field experiment to test this hypothesis in rural Ethiopia. Individuals were randomly invited to watch documentaries about people from similar communities who had succeeded in agriculture or business, without help from government or NGOs. A placebo group watched an Ethiopian entertainment programme and a control group were simply surveyed.

. . . Six months after screening, aspirations had improved among treated individuals and did not change in the placebo or control groups. Treatment effects were larger for those with higher pre-treatment aspirations. We also find treatment effects on savings, use of credit, children’s school enrolment and spending on children’s schooling, suggesting that changes in aspirations can translate into changes in a range of forward-looking behaviours.

What was my reaction? When I saw Chris describe this as “provocative and interesting,” my first thought was—hey, this could be important! I have a lot of respect for Chris Blattman, both regarding his general judgment and his expertise more particularly in research on international development.

My immediate next reaction was a generalized skepticism, the sort of thing I feel when encountering any sort of claim in social science. I read the above paragraphs with a somewhat critical eye and noticed some issues: potential multiple comparisons (“forking paths”) and comparisons between significant and non-significant, also possible issues with “story time.” So now I wanted to see more.

Blattman’s post links to an article, “The Future in Mind: Aspirations and Forward-Looking Behaviour in Rural Ethiopia,” by Bernard Tanguy, Stefan Dercon, Kate Orkin, and Alemayehu Seyoum Taffesse. Here’s the final sentence of the abstract:

The result that a one-hour documentary shown six months earlier induces actual behavioural change suggests a challenging, promising avenue for further research and poverty-related interventions.

OK, maybe. But now I’m really getting skeptical. How much effect can we really expect to get from a one-hour movie? And now I’m looking more carefully at what Chris wrote: “provocative and interesting.” Hmmm . . . Chris doesn’t actually say he believes it!

Now it’s time to read the Tanguy et al. article. Unfortunately the link only gives the abstract, with no pointer to the actual paper that I can see. So I google the title, *The Future in Mind: Aspirations and Forward-Looking Behaviour in Rural Ethiopia*, and it works! the first link is this pdf, it’s a version of the paper from April 2014 but that should be good enough.

How to read a research paper

But now the real work begins. I go into the paper and look for their comparisons: treatment group minus control group, controlling for pre-treatment information. Where to look? I cruise over to the Results section, that would be section 4.1, “Empirical strategy: direct effects,” which begins, “We first examine direct effects on individuals from the experiment.” It looks like I’m interested in model (4.3), and it appears that the results appear in table 6 through 12. And here’s the real punchline:

Overall, despite a relatively soft intervention – a one-hour documentary screening – we find clear evidence of behavioural changes six months after treatment. These results are also in line with our analysis of which components of the aspirations index are affected by treatment.

OK, so let’s take a look at tables 6-12. We’ll start with table 6:

Screen Shot 2014-11-30 at 12.08.41 PM

I’ll focus on the third and sixth columns of numbers, as this is where they are controlling for pre-treatment predictors. And for now I’ll look separately at outcomes straight after screening and after six months. And it looks like I’m suppose to take the difference between treatment and placebo groups. But then there’s a problem: of the four results presented (aspirations and expectations, immediate and after 6 months), only one is statistically significant, and that only at p=.05. So now I’m wondering whassup.

Table 7 considers the participants’ assessment of the films. I don’t care so much about this but I’ll take a quick look:

Screen Shot 2014-11-30 at 12.20.24 PM

Huh? Given the sizes of the standard errors, I don’t understand how these comparisons can be statistically significant. Maybe there was some transcription error? 0.201 should’ve been 0.0201, etc?

Tables 8 and 10, nothing’s statistically significant. This of course does not mean that nothing’s there, it just tells us that the noise is large compared to any signal. No surprise, perhaps, as there’s lots of variation in these survey responses.

Table 9, I’ll ignore, as it’s oriented 90 degrees off and it’s hard to read, also it’s a bunch of estimates of interactions. And given that I don’t really see much going on in the main effects, it’s hard for me to believe there will be much evidence for interactions.

Table 11 is also rotated 90 degrees, also it’s about a “hypothetical demand for credit.” Could be important but I’m not gonna knock myself out trying to read a bunch of tiny numbers (868.15, 1245.80, etc.) Quick scan: three comparisons, one is statistically significant.

And Table 12, nothing statistically significant here either.

At this point I’m desperate for a graph but there’s not much here to quench my thirst in that regard. Just a few cumulative distributions of some survey responses at baseline. Nothing wrong with that but it doesn’t really address the main questions.

So where are we? I just don’t see the evidence for the big claims, actually I don’t even see the evidence for the little claims in the paper. Again, I’m not saying the claims are wrong or even that they have not been demonstrated, I just couldn’t find the relevant information in a quick read.

How to write a research paper

Now let’s flip it around. Given my thought process as described above, how would you write an article so I could more directly get to the point?

You’d want to focus on the path leading from your data and assumptions to your key empirical claims. What would really help would be a graph—“Figure 1” of the paper, or possibly “Figure 2” showing the data and the fitted model, maybe it would be a scatterplot where each dot represents a person, with two different colors representing treated and control groups, plotting outcome vs. a pre-treatment summary, with fitted regression lines overlain.

It shouldn’t take forensics to find the basis for the article’s key claim. And the claims themselves should be presented crisply.

Consider two approaches to writing an article. Both are legitimate:
1. There is a single key finding, a headline result, with everything else being a modification or elaboration of it.
2. There are many little findings, we’re seeing a broad spectrum of results.

Either of these can work, indeed my collaborators and I have published papers of both types.

But I think it’s a good idea to make it clear, right away, where your paper is heading. If it’s the first sort of paper, please state clearly what is the key finding and what is the evidence for it. If it’s the second sort of paper, I’d suggest laying out all the results (positive and negative) in some sort of grid so they can all be visible at once. Otherwise, as a reader, I struggle through the exposition, trying to figure out which results are the most important and what to focus on.

That sort of organization can help the reader and is also relevant when considering questions of multiple comparisons.

Beyond this, it would be helpful to make it clear what you don’t yet know. Not just: The comparison is statistically significant in setting A but not in setting B (or “aspirations had improved among treated individuals and did not change in the placebo or control groups”), but a more direct statement about where are the key remaining uncertainties.

In using the Tanguy et al. paper as an opening to talk about how to read and write research articles, I’m not at all trying to say that it’s a particularly bad example; it’s just an example that was at hand. And, in any case, the authors’ primary goal is not to communicate to me. If their style satisfies their aim of communicating to economists and development specialists, that’s what’s most important. They, and other readers, will I hope take my advice here in more general terms, as serving the goals of statistical communication.

My role in all this

A couple months ago I got into a dispute with political scientist Larry Bartels, who expressed annoyance that I expressed skepticism about a claim he’d made (“Fleeting exposure to ‘irrelevant stimuli’ powerfully shapes our assessments of policy arguments”), without having fully read the research reports upon which his claim was based. In my response, I argued that it was fully appropriate for me to express skepticism based on partial information; or, to put it another way, that my skepticism based on partial information was as valid as his dramatic positive statements (“Here’s how a cartoon smiley face punched a big hole in democratic theory”) which themselves were only based on partial information.

That said, Bartels had a point, which is that a casual reader of a blog post might just take away the skepticism without the nuance. So let me repeat that I have not investigated this Tanguy et al. article in detail, indeed the comments above represent my entire experience of it.

To put it another way, the purpose of this post is not to present a careful investigation into claims about the effect of watching a movie about rural economic development; rather, this is all about the experience of reading a research article and, by implications, suggestions of how to write such an article to make it more accessible to critical readers.

In the meantime, if any reader wants to supply further information to clarify this particular example, feel free. If there’s something important that I’ve missed, I’d like to know; also if anything it would make my argument even stronger, buy demonstrating the difficulties I’ve had in reading a research paper.

P.S. From a few years back, here’s some other advice on writing research articles.

25 thoughts on “How to read (in quantitative social science). And by implication, how to write.

  1. This is not the main issue, but a contributing factor – which has been bothering me more and more lately – is the general practice of all tables and graphs coming at the end of the paper. I know this is required by publications, but many of the papers I read are not in published form. Given that papers and the accompanying analysis have been getting longer and more complicated, it makes it almost impossible (at least without spending a week of my time) to try to understand what a paper is doing and saying. Frankly, I think the practice of posting working papers like that is inexcusable. Nobody writes their papers with the tables and graphs at the end – we move them to the end when submitting them. So, why do we insist that readers must be tortured unnecessarily?

  2. Some anthropological & psychological writings just don’t have the power of literature & non-fiction essays. As for an hour long documentary, I have rarely been moved enough by it change my behavior. But a literary speech or writing has moved me to do so. Personal preference I guess.

  3. How to read a research paper…Table 9, I’ll ignore, as it’s oriented 90 degrees off and it’s hard to read…Table 11 is also rotated 90 degrees…Could be important but I’m not gonna knock myself out trying to read a bunch of tiny numbers

    Yep, this why if you absolutely must include some inconvenient information, it is important to put it in long rotated footnotes that reference details from pages away. Even better if you could put it in an entirely different document like the appendix.

    Anyway, I would first check their headline claim for an “x is causing y”-like statement. Here it is:

    Poor people often do not make investments, even when returns are high.One possible explanation is that they have low aspirations and form mental models of their future opportunities which ignore some options for investment.
    […]
    changes in aspirations can translate into changes in a range of forward-looking behaviours.

    Next I would look for a scatter plot of “aspiration” vs “forward- looking behavior”. If it’s not present, stop. Things will only head downhill from there.

    You can quickly filter out 90%+ of bs papers this way. While it already only takes 10-20 seconds, in theory this could be automated so you only need to glance at a word or symbol and filter 10x faster. Unfortunately, its not clear to me how to accurately identify the presence of these “x is causing y” statements.

    • >Next I would look for a scatter plot of “aspiration” vs “forward- looking behavior”. If it’s not present, stop.

      This is a tremendous heuristic and I fully support it.

      If you study this subject professionally, and can’t show me a graphical representation of fairly raw data, you probably are either hiding something (you did the plot and realize that as soon as I see this plot it fails the intraocular impact test so you need to puff up your results with p values in a table) or, you don’t understand your subject matter enough to figure out how to measure the quantities involved in a useful way and make an appropriate graph, or you don’t care about the subject enough to argue against some editor / reviewer who demands tables etc.

      whatever way, your paper is pure rent-seeking behavior in an attempt to get yourself tenure/promotion/grants and the world would be a better place if your paper didn’t exist and probably if you didn’t have this job at all.

      It’s a little harsh I admit, but it’s a close approximation to the truth.

      • I think in most cases it is actually just people going through the whole “get grant -> collect data -> publish significant results” ritual without much rational thought. Plotting x vs y is just not part of that ritual. It isn’t that they are stupid or corrupt, just NHST has turned their brain off (Fisher’s “dense fog where their brains used to be”).

        I remember seeing people generate dynamite plots in SPSS without ever actually inspecting the underlying data. I would ask “in that group the average is lower and the error bars are much larger, was there just one outlier point, or two clusters, or what?” and they would quite honestly have no idea.

        The result of that process is then used to make claims about stuff like how “phosphorylation of (α1)2β1δε nAChR Ser-189 affects VSMC ECM adhesion in the presence of 10 mM (S)-nicotine” (how the porousness of blood vessels is affected by the sensitivity of a receptor to insane levels of nicotine).

        I just made that example up, but its also a problem that these types of ultra-technical details can’t be sanity checked the same way as the social science results. As a result its a lot easier for bs to accumulate.

  4. “It has become difficult to read many social science and science writing.”

    __________________

    well, this is an eternal complaint in most areas of written communication

    effective communication of complex subjects is a difficult, acquired skill

    it escapes most academics and their students

    brevity and clarity are the keys

    (ponderous graphics and intricate verbiage indeed should be isolated far rearward)

  5. Not really related to the intended topic, but the abstract blurb at the beginning reminds me that people who live comfortably are considered the foremost authorities on what it means to live in impoverishment. Part of me is endlessly and darkly amused by this irony.

  6. I had a different reaction when I heard about this work. Ever been to school in America? Ever been to a ‘career’ day where they show you things you can be. Ever heard about programs for minority kids in which successful members of the same minority come in to demonstrate that it’s possible and to give practical advice and encouragement? Do you think these work? If so, then how do you explain the lack of results? If you say something about peers, about culture being resistant, about how hard it is to believe something can be attained when you look around you and see poverty and you get that this fighter pilot didn’t have a dad but you don’t have a dad and it isn’t that easy for me, then you’re saying somehow that Africans are what? Naive tablets on which you can write a few sentences and change lives? They don’t have peers. They don’t look around and see the same thing they’ve always seen. Their culture doesn’t absorb them the way our culture absorbs people. Funny, but I don’t think that’s the way people actually are: they do have peers, they are rooted in an experience, they absorb expectations and limits. So what makes these Africans so malleable mentally that they can be so easily shifted by a video? Have they never seen a video before? Seriously? They’re so ‘blank slate’ that showing them a video has the properties of a sleight of hand, like Cortez being taken for Quetzalcoatl? What is the model of such successful interference that it can overcome the absence of any real business when people in the developed world are regularly told: if you want to be successful, you need to do x and y and z. There are about 10 zillion systems which promise you can change yourself. Apparently, it works better if you’re African – but only if you’re in Africa (and then if …) – and they show you a video. What is the model of human behavior? Are these people presumed to be incredibly open to the new experiences of business? Are African-Americans not?

    And then to get into it: why do people buy Nike Air Jordan. Because they (still) want to be like Mike. And women want to wear Taylor’s hairstyle because they want to be like Taylor. (Though I hope more the natural eyebrow she’s using becomes popular because women torture their eyebrows.) So the study is saying advertising for your future works? Let’s say it does. Within what parameters? When my grandparents came to America, it was ‘about the future’ but that was because the actual present in what is now Belarus and Ukraine really, really sucked. So maybe it works if you spread ads over Europe about cheap fares to America on old steamships but because that appeals to the desperate. Are these people that desperate? Is that an argument we need to increase the desperation level to find the lowest level of incentive to change? I’ll change. I promise. Don’t beat me, master.

    Yeah, that works: if you increase the desperation level, people will in fact do what is ‘necessary’. That’s a weird area. It leads to stuff as varied as fetishes – how far can you take/give control so ‘necessary’ is eroticized – to the Khmer Rouge increasing the suffering of the Khmer to break their will so they can be made into the true Kampuchea (just as Stalin wanted to break the people to make the new Soviet man). So maybe there’s a wiggle of probability when you for some reason include enough ‘desperate’ people in your population. And then of course you have to understand the parameters of desperation for your potential population, etc. We see natural experiments like this all the time. I’ve read about the increase in the number of single mothers, particularly minority women, getting college degrees. Desperation. Need to feed and raise the child means you have to raise yourself.*

    (I hope you’re enjoying this comment. It’s meant to be fun.)

    This highlights the difference between a liberal and a conservative economist (and thinker): the conservative would look at the result and say, the market work and these women changing reflect that. The liberal economist believes women should not have to struggle that hard, that this is an important national social policy and that it’s wrong to focus on the increasing metric without trying to relieve the suffering and difficulties, that the ‘winners’ in society have an obligation to assist the ‘losers’ – and really shouldn’t say losers but rather disadvantaged. Now consider white men losing their jobs in large numbers and then, over time, because they need to feed their families, they retrain, move to other states, etc. The conservative economist says that’s a failure of national economic policy, while the liberal economist says that global wealth is optimized over time and there are winners and losers. The difference: the liberal economist freely labels some people ‘losers’ and considers others ‘disadvantaged’, while the conservative economist labels some other people ‘losers’ and considers others ‘disadvantaged’. And they each want to spend for their disdavantaged while saying spending on the losers is illogical.

    I get that models have trouble with the idea they represent a presumed mechanism. Gravity works, at least for me on this couch as I’m typing, because there’s a mechanism by which the masses within me and the masses of the objects around me are all not only ‘push-pulled’ to the center of the presumed mass of, well, my body weight hanging against the cushion against the springs against the frame against the floor on to against the foundation to the mantle of the earth and then, oops, toward the sun because I’m not just push-pulled to the center of the earth but to a point that doesn’t exist toward which the earth is moving as the sun is moving. We can plot where that will be but it isn’t yet in actual existence because mathematically it still contains many complex terms, which include of course me getting off the couch, which may actually happen in a few moments. You don’t expect a model to say that ‘we presume only that which we can’t understand because it’s beyond the limit of our understanding’. But it would be nice if they thought about whether the model they’re using makes sense in light of every other example of that model that exists. Is it really that unusual to show a video of encouragement? Is the population to whom this video shown really that different from all other populations to whom encouragement videos are shown? (At that point, I think you should think about the African stuff and wonder: how culturally paternalistic can I be to assume those people are really that fill in the blank as you wish.)

    One additional bit of humor. Ask yourself: which cultures are more resistant to change, the Western developed ones or the ones that have remained undeveloped even though there are abundant examples of development and all use the same patterns of education, investment, and – as the Chinese grasped – a desire to create prosperity for their family, community and country not just for themselves or for their family or clan or their class, as these express in some form of government dedicated to generating material prosperity for the governed as well as for the governors. So the idea is that an undeveloped place is less resistant to change – inspired by watching video – though daily the entire culture demonstrates it is resistant to the positive change that has occurred in many locations in many societies in many cultures? So Americans live in a less resistant culture – one that obviously accepts diversity in more ways than any traditional culture ever has – but are more resistant to a video message? What if we went to all those career days and found that some kids actually were affected? I’m sure they were effective in some measure, but do you think those things have had a massive societal effect? I haven’t seen it. Does this mean highly resistant cultures are actually less resistant to sleight of hand video – oh, we already went down that line of thought. What if we put up video walls – like a mandatory TV in every room – and we send out video all the time with encouraging thoughts. No wait, a mixture of encouraging thoughts and things that motivate. Like there’s a threat of war. Or we’re actually at war. Like in 1984. Like the North Koreans tell their people has happened several times (spoiler alert: they won them all). So the end takeaway is that studies like this give encouragement to Kim? That’s depressing.

    I’m actually interested in inflection point interventions. Like can you change behavior at specific life moments to achieve a better result. Junkies ‘hit bottom’ after all. We hope people in general don’t need to be at the level of homeless drug addict – see Andrew Zimmern – before making different if not better decisions. But you need a model for why. Like you’ve just learned you have cancer: can you do what’s necessary to change your life so you can live through this so we can spend money on you and others more efficiently? There are many inflection points. In Fiddler, it was a pogrom.

  7. “methods/statistics” supplements are horrible trend. It is part of creeping inferential illiteracy: serious readers know that “conclusion” mean nothing if the study rests on bad model, internally invalid sampling etc etc

      • From the abstract and references it seems very sensible -“identify and justify target populations for the reported findings”.

        That may be very hard or even too hard but it should be addressed.

        (I guess I’ll have to read the paper.)

        • “That may be very hard or even too hard but it should be addressed. ”

          Yes, that’s one of the reasons i don’t get the paper.

          Usually the method section of papers include (some) information about the sample, procedures, materials. This is useful i reason, but any further theorizing and/or explanation by the authors as to why they think their results will, or will not, hold for certain other samples/populations/procedures/materials is pretty useless.

          Either researchers point to evidence that their findings hold for other samples/populations/procedures/materials if they want to include that kind of information in their paper (i.c. via information coming from actual replications), or they don’t, but i am not really interested in all their theorizing and/or mental gymnastics trying to make clear why their findings probably will, or will not, hold for some arbitrarily chosen other sample/population/procedure/materials.

  8. I am just going by what Andrew printed — but in Table 6, is the result statistically significant in the wrong direction?? Isn’t it the case that the difference between treated and placebo is 0.12 at t = 1 and 0.00 at t = 2, and they showed that the hypothesis of no difference at t = 1 and t = 2 is soundly rejected… but that would only be because the effect disappeared in the 6 months.

  9. Nobody writes their papers with the tables and graphs at the end – we move them to the end when submitting them.

    These days (for values of “these” in at least the high thousands!) there’s no excuse, whatever, for any paper or preprint which is readable on-line not to have hypertext links that enable readers to move easily back and forth between (A) tables, graphs, and other kinds of figures, and (B) references to these in the text. (It also ought to be easier than it is, even with such links, to have both (A) and (B) visible at the same time.) At least with publications on paper, a reader could keep one finger on an (A) page to facilitate rapid comparison with the corresponding (B) page(s).

Leave a Reply

Your email address will not be published. Required fields are marked *