Differences between computer science and statistics in the rate of forgetting

John Langford writes (in an entry linked to for other reasons by Aleks),

How many papers do you remember from 2006? 2005? 2002? 1997? 1987? 1967? One way to judge this would be to look at the citations of the papers you write—how many came from which year? For myself [Langford], the answers on recent papers are:

year 2006 2005 2002 1997 1987 1967

count 4 10 5 1 0 0

This spectrum is fairly typical of papers in general. There are many reasons that citations are focused on recent papers. . . .

Another way to study this is to look at the dates of Langford’s own most-cited papers. Doing a Google Scholar on “John Langford” yields papers from 2000, 2003, 1999, 2004, 2001, 2002, 2003, 2002, 2002, 1998, . . . Slightly earlier than the years of the papers he cites the most, but this makes sense if you think of time-censoring (older papers have had more years in which to be cited).

Comparing computer science to statistics

Anyway, I was surprised that Langford was citing such recent papers. Just to look at some of my own recent papers, about half the papers I’m citing come from before 2000. Similarly, my own most cited papers are from 1992, 1996, 1998, 1993, 1996, 1998, 1998, 1990, . . .

This is just a sample of size 2, but based on this, let me conjecture that CS is a particularly new field, and thus the pattern John saw may be particularly strong in his field.

3 thoughts on “Differences between computer science and statistics in the rate of forgetting

  1. I havent checked but in economics the bulk of the citations is of relatively recent papers (last ten years) in fact a rule of thumb I've heard before is that if more than half the papers youre citing are 10 years old or more then youre probably wasting your time.

  2. I onder about the correlation between the citation age and the average time in press in different fields. In Operations Research, I hear regular reports from faculty of manuscripts being published which were submitted 4-7 years ago….

  3. I am currently writing a review of "classical" psychometric theory which is mostly viewed to have originated with a paper by Charles Spearman in 1904, and culminated in a (in psychometrics) famous book by Lord & Novick in 1968. Even if both these publications are really long-lived in terms of being cited, it seems that most people citing them today haven't really read them. Especially the Lord and Novick book contains a lot of information which comes up now and again as "new".
    I fear that this is (in part) a result of specialization – Lord and Novick have probably had a thorough mathematical/statistical education together with their knowledge of educational/psychological assessment. Nowadays, there are many people working in the area with "only" a psychology background, and the mathematical precision is lost on them – the result is that you will often find simulation studies where some algebra would lead to far more insights.
    Thus, I would not agree that you are wasting your time when citing older papers – sometimes there is a lot of wisdom hidden there, and the really good ideas keep popping up every few decades. The difficulty seems to be to convince the reviewers…

Comments are closed.