Skip to content

Decline Effect in Linguistics?

Josef Fruehwald writes:

In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . .

I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate:

The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount.

I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they’re highly overlapping. Next, let’s say I randomly sample 10 people from each population . . .

[simulation results follow, including some graphs]

I [Fruehwald] think how much I ought to worry about the decline effect in my research, and linguistic research in general, is inversely proportional to the size of the effects we’re trying to chase down. If the true size of the effects we’re investigating are large, then our tests are more likely to be well powered, and we are less likely to experience Type M errors.

And in general, I don’t think the field has exhausted all of our sledgehammer effects. For example, Sprouse and Almeida (2012) [pdf] successfully replicated somewhere around 98% of the syntactic judgments from the syntax textbook Core Syntax (Adger 2003) using experimental methods (a pretty good replication rate if you ask me), and in general, the estimated effect sizes were very large.

However, there is one phenomenon that I’ve looked at that I think has been following a decline effect pattern: the exponential pattern in /t d/ deletion. . . .

I’m curious what the linguists in the audience think, especially about the last point (for which Fruehwald supplies a bunch of data that can be found at his linked post).


  1. It was embarrassing to hear Radiolab cover the decline effect as if there’s some sort of universal physical force decreasing the newness in the universe (is this or science?!) rather than simply “They did their stats wrong.”

  2. Luke says:

    My guess (as a linguist) is that we might expect to see effects shrink over time, but not in the field as a whole but rather across different sub-domains and as a function theoretical maturity. In other words, we might expect theory in a particular area to move over time from broad contrasts (and large effects) toward more subtle contrasts between variables that result in smaller effects.

    I’ll give a quick example from a recent synthesis (Plonsky and Gass, 2011 in Language Learning). We looked at 174 studies in the interactionist tradition of second language acquisition from 1980-2009 and found the average d values for treatment-comparison contrasts to decline over the life of this line of research: 1980s = 1.62, 1990s = .82, 2000s = .52. And of course and you might expect, theory (and research) in this area has been moving steadily toward increased nuance.

  3. Edward says:

    It is hard for me (a linguist) to see how the studies of /t,d/ deletion constitute an example of the decline effect – in fact you could argue there has been an increase in effect size. Guy (1991) was effectively comparing two models of the rate at which word-final /t/ and /d/ are deleted in pronunciations of regular past tense words like ‘packed’ and ‘semi-weak’ past tense words like ‘kept’ (where the past tense is marked by a vowel change as well as addition of /t,d/): one model in which these two rates are independent, and another in which they are constrained to follow the relationship mentioned by Fruehwald – i.e. a model with one less parameter. Guy concluded that the simpler model fitted as well as the more complex model, which would normally be construed as absence of an effect. Fruehwald doesn’t report direct comparisons of the simpler/more complex models for subsequent data sets, but it appears that the simpler model fits less well in the later studies, so it looks like the effect of past tense type (regular vs. semi-weak) on /t,d/ deletion has grown over time. Or rather, Guy’s model just doesn’t work for similar phenomena in other dialects of English.

    • I think the way you’ve described the situation is what I meant by the decline effect. The history as I understand it was an almost too good to be true result, rapidly followed by a few enthusiastic replications, and then increasingly more and more reports where the model didn’t fit. I guess I was talking more about how the goodness of fit of the particular model has declined over time, rather than any particular effect size.

      And in that sense, I guess the exponential model is another case of a “small” effect, insofar as the goodness of fit of the exponential model compared to really any other model is highly sensitive to slight deviations in the estimated parameters.

      • Edward says:

        I suppose that this kind of decline effect could arise from screening for non-significance of the comparison of a model to a saturated model: Guy might not have published his theory if he had found significant lack of fit of the exponential model. But my guess is that the difference in fit between the simpler and complex models of /t,d/ deletion in past-tense verbs is not inherently small – it just happened to be in Guy’s data set – so there’s not a general problem of estimating small effects/differences in goodness of fit in this area.