Skip to content
 

Reputational incentives and post-publication review: two (partial) solutions to the misinformation problem

So. There are erroneous analyses published in scientific journals and in the news. Here I’m not talking not about outright propaganda, but about mistakes that happen to coincide with the preconceptions of their authors.

We’ve seen lots of examples. Here are just a few:

– Political scientist Larry Bartels is committed to a model of politics in which voters make decisions based on irrelevant information. He’s published claims about shark attacks deciding elections and subliminal smiley faces determining attitudes about immigration. In both cases, second looks by others showed that the evidence wasn’t really there. I think Bartels was sincere; he just did sloppy analyses—statistics is hard!—and jumped to conclusions that supported his existing views.

– New York Times columnist David Brooks has a habit of citing statistics that fall apart under closer inspection. I think Brooks believes these things when he writes them—OK, I guess he never really believed that Red Lobster thing, he must really have been lying exercising poetic license on that one—but what’s important is that these stories work to make his political points, and he doesn’t care when they’re proved wrong.

– David’s namesake and fellow NYT op-ed columnist Arthur Brooks stepped in it one or twice when reporting survey data. He wrote that Tea Party supporters were happier than other voters, but a careful look at the data suggested the opposite. A. Brooks’s conclusions were counterintuitive and supported his political views; they just didn’t happen to line up with reality.

– The familiar menagerie from the published literature in social and behavioral sciences: himmicanes, air rage, ESP, ages ending in 9, power pose, pizzagate, ovulation and voting, ovulation and clothing, beauty and sex ratio, fat arms and voting, etc etc.

– Gregg Easterbrook writing about politics.

And . . . we have a new one. A colleague emailed me expressing annoyance at a recent NYT op-ed by historian Stephanie Coontz entitled, “Do Millennial Men Want Stay-at-Home Wives?”

Emily Beam does the garbage collection. The short answer is that, no, there’s no evidence that millennial men want stay-at-home wives. Here’s Beam:

You can’t say a lot about millennials based on talking to 66 men.

The GSS surveys are pretty small – about 2,000-3,000 per wave – so once you split by sample, and then split by age, and then exclude the older millennials (age 26-34) who don’t show any negative trend in gender equality, you’re left with cells of about 60-100 men ages 18-25 per wave. . . .

Suppose you want to know whether there is a downward trend in young male disagreement with the women-in-the-kitchen statement. Using all available GSS data, there is a positive, not statistically significant trend in men’s attitudes (more disagreement). Starting in 1988 only, there is very, very small negative, not statistically significant effect.

Only if we pick 1994 as a starting point, as Coontz does, ignoring the dip just a few years prior, do we see a negative less-than half-percentage point drop in disagreement per year, significant at the 10-percent level.

To Coontz’s (or the NYT’s) credit, they followed up with a correction, but it’s half-assed:

The trend still confirms a rise in traditionalism among high school seniors and 18-to-25-year-olds, but the new data shows that this rise is no longer driven mainly by young men, as it was in the General Social Survey results from 1994 through 2014.

And at this point I have no reason to believe anything that Coontz says on this topic, any more than I’d trust what David Brooks has to say about high school test scores or the price of dinner at Red Lobster, or Arthur Brooks on happiness measurements, or Susan Fiske on himmicanes, power pose, and air rage. All these people made natural mistakes but then were overcommitted, in part I suspect because the mistaken analyses what they’d like to think is true.

But it’s good enough for the New York Times, or PPNAS, right?

The question is, what to do about it. Peer review can’t be the solution: for scientific journals, the problem with peer review is the peers, and when it comes to articles in the newspaper, there’s no way to do systematic review. The NYT can’t very well send all their demography op-eds to Emily Beam and Jay Livingston, after all. Actually, maybe they could—it’s not like they publish so many op-eds on the topic—but I don’t think this is going to happen.

So here are two solutions:

1. Reputational incentives. Make people own their errors. It’s sometimes considered rude to do this, to remind people that Satoshi Kanazawa Satoshi Kanazawa Satoshi Kanazawa published a series of papers that were dead on arrival because the random variation in his data was so much larger than any possible signal. Or to remind people that Amy Cuddy Amy Cuddy Amy Cuddy still goes around promoting power pose even thought the first author on that paper had disowned the entire thing. Or that John Bargh John Bargh John Bargh made a career out of a mistake and now refuses to admit his findings didn’t replicate. Or that David Brooks David Brooks David Brooks reports false numbers and then refused to correct them. Or that Stephanie Coontz Stephanie Coontz Stephanie Coontz jumped to conclusions based on a sloppy reading of trends from a survey.

But . . . maybe we need these negative incentives. If there’s a positive incentive for getting your name out there, there should be a negative incentive for getting it wrong. I’m not saying the positive and negative incentives should be equal, just that there more of a motivation for people to check what they’re doing.

And, yes, don’t keep it a secret that I published a false theorem once, and, another time, had to retract the entire empirical section of a published paper because we’d reverse-coded a key variable in our analysis.

2. Post-publication review.

I’ve talked about this one before. Do it for real, in scientific journals and also the newspapers. Correct your errors. And, when you do so, link to the people who did the better analyses.

Incentives and post-publication review go together. To the extent that David Brooks is known as the guy who reports made-up statistics and then doesn’t correct them—if this is his reputation—this gives the incentives for future Brookses (if not David himself) to prominently correct his mistakes. If Stephanie Coontz and the New York Times don’t want to be mocked on twitter, they’re motivated to follow up with serious corrections, not minimalist damage control.

Some perspective here

Look, I’m not talking about tarring and feathering here. The point is that incentives are real; they already exist. You really do (I assume) get a career bump from publishing in Psychological Science and PPNAS, and your work gets more noticed if you publish an op-ed in the NYT or if you’re featured on NPR or Ted or wherever. If all incentives are positive, that creates problems. It creates a motivation for sloppy work. It’s not that anyone is trying to do sloppy work.

Econ gets it (pretty much) right

Say what you want about economists, but they’ve got this down. First off, they understand the importance of incentives. Second, they’re harsh, harsh critics of each other. There’s not much of an econ equivalent to quickie papers in Psychological Science or PPNAS. Serious econ papers go through tons of review. Duds still get through, of course (even some duds in PPNAS). But, overall, it seems to me that economists avoid what might be called the “happy talk” problem. When an economist publishes something, he or she tries to get it right (politically-motivated work aside), in awareness that lots of people are on the lookout for errors, and this will rebound back to the author’s reputation.

59 Comments

  1. John Hall says:

    Maybe the difference between Econ and Psychology statistics education is the rigor? I looked up Colombia’s programs for Econ PhD and Psychology MPhil. Econ Phd requires three statistics courses and Psychology requires two. The Psychology syllabus’ seem more applied than I would expect from the Econ courses.

  2. Dale Lehman says:

    Maybe it is because I am an economist (and super-critical of my own profession), but it is hard for me to let economists bask in the glory you are bestowing on them. Economists are trained well – technically they receive better training than many of their peers in other disciplines (though not biostatisticians, in my view). But economists are no role models when it comes to open data, perverse incentives, sharing, peer review policies, etc. In fact, economists are among the worst offenders when it comes to professional incentives. I’ve always thought that when the stakes are low, the behaviors are the worst. Thus, academic politics can be the most hurtful and random compared with business politics. There is generally more money at stake for economists than many other academics (though not as much as in medicine) so academic economists may have somewhat less random politics than, say, social psychologists. But these are only matters of degree, not of kind.

    Viewed from a 40,000 foot level, I don’t think you can characterize economists as having “this down.” Yes they are harsh critics of each other, but not in the pursuit of science – rather it is in pursuit of their own careers. Yes, they are well aware of incentives – but this does not result in better behavior, only more purposeful behavior. There may not be many economic Waninks (although he is associated with economists I would note) but there are plenty of instances of economists making errors, not acknowledging them and if forced to acknowledge errors, usually just move the goalposts so they can stand by their original conclusions.

    If you are looking for good role models, I’d suggest you look elsewhere. I’m not sure there is an academic discipline that fulfills this objective, but I’m willing to hear people suggest them (and provide some evidence to that effect).

    • I don’t think there are academic groups that have the incentives down. But I do think there are individuals in every field who are committed to good science.

    • numeric says:

      Hard to disagree with anything Dale says–for confirmation read the following that I’ve linked to previously on this blog:

      https://krugman.blogs.nytimes.com/2012/08/10/culture-of-fraud/

      I do find it amazing that anyone other than an employee of the Ministry of Truth could every write a headline such as “Econ gets it (pretty much) right”. I think you’re suffering from the cargo cult syndrome–that because economists use complicated mathematical formulations it must be science. Au contraire–these economists are the ones their math professor told them that they shouldn’t go on in mathematics but instead seek another field. The “complicated” mathematical formulations would be rejected as trivial by mathematicians (I’ve seen reviews from mathematics journals when economists submit work to those journals–they look like a third-rate heavyweight whose manager who has let them be outmatched), and from the standpoint of an empirical field, are practically orthogonal to empirical reality. From bogus microeconomic models that have individuals exerting superhuman efforts for a chance for epsilon to macroeconomic models which failed utterly miserably–the real business cycle for example (http://www.nytimes.com/2009/09/06/magazine/06Economic-t.html), it is not a science but rather a forum for self-indulgent ideologues.

      • Martha (Smith) says:

        What numeric says fits with my undergraduate experience as a math major: The economics department “courted” what you might call the second tier honors math majors — who were flattered, but decided to stick with math.

      • jrc says:

        “Au contraire–these economists are the ones their math professor told them that they shouldn’t go on in mathematics but instead seek another field.”

        Hey! Watch what you generalize. This Economist was never told not to go on in mathematics. I was told I shouldn’t go on in Economics! (the actual phrase from an Econ professor was “maybe you should do something else with your life”).

        I also both agree with you and Dale that my/our field has major problems, and agree with Andrew that there is something fundamentally different. For starters, people in Econ will replicate your empirical work on publicly available data, and if you are wrong you will get called out. Furthermore, it is quite common for many papers to estimate similar parameters, and you have to convince people that your estimates are the best ones… in so many other fields it feels like people think an estimate is an estimate and no one can judge quality, and so if someone else has estimated something, you just take it for granted and “build” on that result. That doesn’t happen so much in Econ. Part of that is that effect sizes, and not just significance, matter to economists.

        None of this means we don’t have problems. But my impression of other social sciences is that, if you show up at a seminar with a table full of stars, people will believe whatever you say and the comments will mostly be about how to extend that interpretation (or worse, an exercise in wildly over-interpreting the results five steps down the theoretical chain using triple-interaction coefficients). In Econ, if you show up at a seminar, you know that no one believes you right off the bat, and you are expected to convince them that you are interpreting your results reasonably. That changes the dynamic – it shifts the burden of proof to some degree off statistical significance and onto the statistical, logical and rhetorical argument provided by the presenter. I think that is something Economists do more right than other fields.

        • numeric says:

          While not disagreeing with your assessment of the other social sciences, I don’t believe that economists have actually contributed that much to society and may have hurt it by their mathematical formulations that hide larger truths (I have the same disagreements about rational choice analysis of American national elections which hid the larger truth of racial animosities driving American politics, of which the last election should be the death blow to the rational choice model). The real business cycle model has surely been discredited, the benefits of deregulation have been dissipated by oligopolistic concentration, and economics of information are totally overwhelmed by the exigencies of health care. Yet economists continue to shill for deprecated theories that are a classic example of the “thrall”. The larger truths may seem impressionistic and qualitative yet they are, in essence, true, and the mathematical formalism hides them. And these aren’t addressed by economists.

          As a topical example, consider https://newrepublic.com/article/141663/united-states-work. The point of this article is that work has become, for a majority of Americans, an alienating and dehumanizing experience. Yet this would never show up in any mathematical model of the economy. That, combined with the the transfer of uncertainty from large institutions to the individual, do a lot to explain why the country is in the state it is. Yet tenured economists, writing from their comfortable, well-paid, interesting (if not especially relevant) works, cannot account for this. And those other social sciences, with their stars and all, can to some extent. My point is, the mathematics is hiding the truth and in that sense it is cargo cult, an edifice created that distorts reality by taking on the form of those sciences where such structures do work. The result is an alienated, atomized and anomic society.

          • Martha (Smith) says:

            Both jrc’s and numeric’s comments seem cogent. I’m not sure how to put them together.

            • There are several distinct ways that Academia goes wrong:

              1) Doing things that sound like science but aren’t (himmicanes, power poses, ESP etc)

              2) Doing things that are science but are largely irrelevant (solving deeply mathematical problems about uncertainty in thermal flow through a kind of nuclear reactor all of which have been decommissioned and banned 15 years ago, or something like that, just because you know what math to use…. Engineering and academic computer science does this a bunch)

              3) Intense theorizing without connection to reality (String theory? Economic rational choice models? I am not familiar enough with either to say for sure)

              4)… there are probably several other important failure modes

              I think you can reconcile jrc and numeric, jrc is saying that Econ has an appropriate level of skepticism about claims, and an interest in identification, and a narrow interest in empirical results (individual papers want to look at data and estimate something), but numeric is saying that Econ suffers from 2 and 3, either failing to consider real important considerations simply because the theory they have only applies to an irrelevant problem (3) or applying a mathematical technique simply because they know how (lots of regression discontinuity on pollution and the distance from the river in china type stuff), without considering the bigger issue of how to do a good job of actually modeling what matters (the river in china stuff you’d delve deeply into the history of the development and migration of these cities and the measured pollution levels and the demographics of the population etc. etc etc.

  3. Dzhaughn says:

    #1 I fear the names you repeat benefit from every repetition, favorable or not.

  4. Eric says:

    Andrew,

    Is it necessary to say that someone did sloppy work if they got something wrong? As you say, statistics is hard. If someone got something wrong, that doesn’t necessarily lead me to believe it was sloppy work. See, e.v. the hot hand stuff, which I really don’t think qualifies as sloppy.

    • Andrew says:

      Eric:

      People can definitely make mistakes out of ignorance, not necessarily being sloppy. Also, people can be sloppy. For example, Bartels was sloppy in discussing the smiley-face stuff in that he was presenting a path analysis as if it were a direct causal inference. I don’t think Bartels was sloppy on purpose, but I’d still call it sloppy of him to not fully think things through. In other settings you get work that is, at best, sloppy. For example, when Susan Fiske published those incorrect data summaries, presenting things as statistically significant when they weren’t, I’m assuming she and her colleagues were just sloppy, not that they were trying to cheat. Also, I don’t think sloppiness is the world’s biggest crime. I’ve been sloppy too! Just recently we submitted a paper to a journal and one of the referees caught some really sloppy things we’d done. I’m really glad they caught it for us.

      • Allan Cousins says:

        It’s just really hard to be good, all the time.

        • Andrew says:

          Allan:

          We all make mistakes. That’s why we should welcome criticism and take the opportunity to learn, which is what Bartels, Brooks, etc., did not do.

          • Eric says:

            Andrew:

            I’ve been thinking about this a lot since Tuesday.

            1. I’m not sure that “I’ve been sloppy too” quite covers it. I suspect that essentially all empirical papers miss important issues that they really shouldn’t, including mine and yours. So I don’t think it’s all that useful to say that it’s sloppy of somebody on some paper “to not fully think things through.” Who fully thinks things through? Most applied social scientists really don’t have the necessary math training to be comfortable formalizing everything in a way that makes it easy to “fully” assess their claims. So I don’t think it’s really worth much to say someone didn’t do a job completely and that was sloppy.

            2. Therefore, perhaps we should put more weight on the willingness to admit wrongness when wrong. What are the practical implications here? If someone shows my work has some flaw, am I supposed to retract it? Seems to me the answer is no. Seems like the property of note is really just being open to it when the literature reassesses conclusions from one’s own work: being willing to learn. Does that mean you need to always be on the lookout for having made incorrect claims in the past, and announce them whenever it happens? This seems overly burdensome; most of my scientific claims are probably substantially off the mark, but in time the literature takes care of that itself. I’m not sure what the original author’s responsibility is, if any, beyond being open to scientific advancement of the literature.

  5. Chris says:

    Two economists received the highest prize in their field at the same time not from being right, but for coming to mutually exclusive conclusions in a nice way:
    http://www.npr.org/sections/money/2013/11/01/242351065/episode-493-whats-a-bubble-nobel-edition

    Economists may be better at math, but there’s a lot of dreck out there as well.

    • Nick says:

      They didn’t win the Nobel because of their mutually exclusive conclusions, though… they won because of specific, concrete contributions to the understanding of the movement of asset prices. Fama’s random walk hypothesis was important to subsequent research, including Shiller’s.

      Fama and Shiller don’t represent dueling schools of thought so much as they represent Newton and Einstein — you need to understand Newton (and why it’s probably a pretty good approximation for many situations) before you can understand the general theory of relativity. Fama and Shiller differ in that Fama tends to believe that the efficient markets hypothesis is a better approximation than Shiller does.

  6. Alex Godofsky says:

    Maybe try to fix it from the money end?

    For example, if the problem is not enough replications – reserve X% of federal grant money for replications.

    If the problem is not enough pre-registration – ban or cap federal funding for non-pre-registered studies.

    • Pre registration is a hack that makes p values more informative. Reifying it would be a mistake like reifying the teletype for the deaf instead of letting the world move on to a better technology.

      • Seth says:

        I generally agree in theory, but in many academic disciplines you simply can’t get there from here. Deeply entrenched patterns of thought and practice mean that the null-hypothesis testing framework isn’t going anywhere any time soon. Some of that is because of bad incentives and poor statistical education, but some of it is also that people* seem to naturally reason about the world in a binary, rule-based way — there is a significant effect or there is no effect. They know this is not true if you press them on it but it’s really really hard not to think this way.

        Pre-registration is something large numbers of researchers can intuitively understand and that could be practiced and enforced at scale. And if the burden of pre-registration becomes so great that researchers resort to talking about effect sizes and variability instead of p-values, so much the better.

        *At least the people that end up in academia.

        • I think it’s fair to say that if a given field can’t “get there from here” then it should basically cease to exist and build itself up from scratch on the merits. Die in fire and rebirth from the ashes works for the Phoenix, why not Social Psychology and Medical-pseudo-research and soforth?

          Of course, no one is making me emperor so my opinion has little actionable consequences. Still I figure if you aren’t really doing science then you’re just rent seeking on grants, and this is a big part of what Universities do these days. see here: http://andrewgelman.com/2017/04/16/sad-sad-co-dependent-suckers/#comment-466712

          Academics who participate in the current system who do cargo cult science are active tools of this malady.

          We’d be better off without all that.

          • Note also that I make a distinction between Academic organizations and fields, and the individuals within those fields. In every field there are people who care and want to do a good job. But when the social organization surrounding it makes that virtually impossible, the organization needs to die.

          • Seth says:

            This is a very binary, rule-based perspective — a field is either “true science” or pure rent seeking. Maybe you should calculate a p-value for the null hypothesis that a field is pure rent seeking :). Every field for which we cannot reject this hypothesis shall be sent to gulag!

            Every human organization on the planet is engaged in some mixture of rent-seeking and value creation. It is often very difficult to tell one from the other. And in any case, the Phoenix does not exist so I don’t see the life-cycles of mythological creatures as being particularly useful models for academic reform.

            • The goal of provocative blog comments is to provoke, not provide nuance. We should consider the idea that we’d be better off without large grants to rent-seeking universities and their armies of c̶l̶i̶c̶k̶ ̶f̶a̶r̶m̶e̶r̶s̶ principal investigators. perhaps more real science would get done if parasitism didn’t pay so well.

  7. Eric Loken says:

    I guess there are producers and consumers of this stuff. And everyone is guilty to some degree of just wanting to believe. This is likely why psychology undergrad programs are so popular – the appeal of science that explains our personal and interpersonal lives. It’s no accident that all your egregious examples have a social science element – it’s what interests us most and what’s hardest to formalize. A less hungry audience might also be a disincentive. (and yes, I admit to being in the audience) The economists have the “benefit” of a less emotionally charged, and better quantified, field. But the business school psychology folks on the other hand produce many of your examples.

  8. Glen M. Sizemore says:

    ” If all incentives are positive, that creates problems. It creates a motivation for sloppy work.”

    GS: On what do you base this claim, Andrew?

    • Andrew says:

      Glen:

      I say this because it seems to me that the current publishing system is like a lottery: every paper you submit is like a lottery ticket, and if you get a paper published, it’s a win. If the publication is in Psychological Science or PPNAS, it’s a big win. But if it can only be a win and never a loss, the motivation is to submit as many papers as possible to these journals. What does it take to get published there? Stunning results and statistical significance. You can get stunning results and statistical significance with sloppy work. Doing sloppy work can actually make it easier to get stunning results and statistical significance—consider for example the ovulation and voting paper. If there were negative incentives, then it could make sense to think twice before submitting poor research to a top journal: what if it got accepted and then later the flaws were discovered?? But if all incentives are positive, then why not roll the dice and submit? And that’s what it seems people have been doing.

      • Martha (Smith) says:

        Sounds convincing to me.

      • Glen M. Sizemore says:

        Andrew:

        But that isn’t an issue with “positive incentives” per se – assuming that the term is supposed to mean something like “positive reinforcement.” I see what you are saying but it implies that there is a reasonable chance that “good journals” will sometimes publish sloppy trash. Then you could get a circumstance like what you are talking about. But if “good journals” didn’t publish trash, sloppy researchers would never get a paper published. So…“positive incentives” would work just fine – if journals didn’t publish trash. As behavior analysts (practitioners of the natural science of behavior) say: “Ya’ get what you reinforce.”

        Anyway…it is worth pointing out that there is a science that studies the effects of dependencies between behavior and events on the probability of those classes of responses. Complex human behavior is, like that of many species, a function of past consequences.

        • Martha (Smith) says:

          GS: ” But if “good journals” didn’t publish trash, sloppy researchers would never get a paper published.”

          Isn’t not publishing trash a negative incentive?

          • Glen M. Sizemore says:

            “Isn’t not publishing trash a negative incentive?”

            Well…since “negative incentive” isn’t any kind of technical term, it means whatever you say it does, I suppose. To “not do anything” (i.e., not publishing bad research) might be considered, technically speaking, “extinction” (a bit more should possibly be said to clarify, but let’s leave it there for now). However, the receipt of a rejection notice could decrease the probability of submission (and presumably the work, sloppy though it may be). Such an event, if it decreased the p(submission) might be an example of, again speaking technically, “positive punishment.” Having said all that, though, evaluating the contingencies relevant to some human behavior can be difficult. If I, for example, train a rat to respond in a variable-ratio schedule of food presentation, it will respond at a steady, rather high rate. If I tell a human “pulling that lever will sometimes result in the delivery of a dollar and the faster you respond, the faster dollars will be delivered,” the human’s behavior will closely resemble that of the rat, but the rat responds because of all the contingencies I arranged; the humans behavior, however, will have to do with other contingencies (namely those that are relevant to rule-following). Anyway, it all becomes somewhat complicated…behavior is, after all, among the most complex subject matters ever submitted for scientific analysis.

            My whole point, in case it isn’t clear, is simply that the whole problem goes away if journals always publish only good research. Easier said then done, perhaps…there is ignorance of what constitutes “good research” and the journals make money by publishing junk etc. But we should be clear about whatever behavioral processes and contingencies are operating – all that is what is relevant vis-à-vis meliorating the “junk science” epidemic.

  9. BenK says:

    You may not want to tar and feather, but many biologists, seeing good labs closed down (with numerous careers curtailed) because of showboating by incompetents, may be in a different mood.

  10. Anonymous says:

    1) Behavioral economics experiments replicate at a higher rate than social psychology experiments, but it’s a difference of degree, and both should replicate at higher rates.
    http://science.sciencemag.org/content/early/2016/03/02/science.aaf0918.full

    2) Shouldn’t the answer to substandard work be criticism of the work? Criticism of the researcher seems merited when there’s a consistent track record of substandard work associated with an author. Until there is, the more logical criticism seems to me to be of a given piece of research. And criticizing the research rather than the researcher is certainly sufficient to create reputational incentives.

    Criticizing the research itself also seems to be the more generous approach, more consistent with civil norms of academic discourse. In an argument, I always want to be the last person to make the ad hominem move, not the first.

  11. Sean Mackinnon says:

    So, I think you could use Operant Conditioning terms here. So, “positive incentives” are really “positive reinforcement” and what you’re describing as “negative incentives” are really “punishment.” In a nutshell, operant conditioning theory suggests that people are more likely to do behaviours if they are rewarded, and less likely to do behaviors if they are punished.

    Only reason I’m saying this is because I’m skeptical that “negative reputational incentives” will work. Presumably, you want something like this. Person does bad research. They get relentlessly mocked on social media as punishment (or whatever you want as the punishment). As a result, then retract the research, and/or correct their error. This doesn’t seem to actually happen (e.g., look at all the people you’ve criticized on this blog who haven’t done anything!).

    The punishment should (in theory) make people less likely to do the bad thing again in the future. But … I don’t think it will make them engage in a new, productive behavior (i.e., retracting their article). Retracting an article or publicly admitting error involves embarrassment and reputational harm — more punishment. People do behaviours less if they are punished. It’s a little bit like bullying a kid online to convince them to go to school without pants. Getting bullied sucks, but but showing up to school without pants would make the problem worse, so the only logical response is to endure the bullying. It’s using punishment to get someone to willingly submit to more punishment.

    There needs to actually be some sort of positive reinforcement for publicly admitting and fixing errors (or alternatively, a reduction of the punishment for doing so). If you want people to do something, you need to reward it, not punish it.

    • Andrew says:

      Sean:

      That makes a lot of sense; thanks.

    • Glen M. Sizemore says:

      SM: So, I think you could use Operant Conditioning terms here.

      GS: Hey! Me too! Only…ya’ gotta use ‘em correctly.

      SM: So, “positive incentives” are really “positive reinforcement”[…]

      GS: Well…it depends on how the term “positive incentives” is used. I’m guessing that, to Andrew, “positive” means “good.” But the “positive” in “positive reinforcement” means there is a positive correlation between the rate of response (or some other measure of probability) and some event. If the rate of response increases when a positive (in the sense I discussed) contingent relation between responding and some event is arranged, and the increase in probability is due to the contingency, the name “positive reinforcement” is applied to the behavioral process and “reinforcer” is applied to consequence of responding. Notice that reinforcement is functionally defined; one cannot say “I reinforced behavior but it didn’t increase in rate or otherwise become more probable.” Virtually all lay-people, and probably most psychologists, don’t understand the real meaning of “reinforcement” – or the “positive” in “positive reinforcement” and “positive punishment.”

      SM: […]and what you’re describing as “negative incentives” are really “punishment.”

      GS: Some of what he (Andrew) says could probably be construed as “negative reinforcement.”

      SM: In a nutshell, operant conditioning theory suggests that people are more likely to do behaviours if they are rewarded, and less likely to do behaviors if they are punished.

      GS: It is hard to use behavior analytic terms correctly. First of all, “reward” (which is not a behavior-analytic term) generally does not have the same use or definition as “positive reinforcement.” People (and non-humans) may “be rewarded” in ordinary language and would-be science, but such usage is inconsistent with the term “reinforcement.” In behavior analysis, it is responses that are reinforced, not people. A young graduate student that says, “We reinforced the rat…” is likely to be asked “Did you shove rebar up its ass?”

      SM: Only reason I’m saying this is because I’m skeptical that “negative reputational incentives” will work. Presumably, you want something like this. Person does bad research. They get relentlessly mocked on social media as punishment (or whatever you want as the punishment). As a result, then retract the research, and/or correct their error. This doesn’t seem to actually happen (e.g., look at all the people you’ve criticized on this blog who haven’t done anything!).

      GS: Actually…what you are describing sounds more like “negative reinforcement” – but the issue is complicated, and I’ll leave it alone here.

      SM: The punishment should (in theory) make people less likely to do the bad thing again in the future.

      GS: No…”punishment” (as an “operant conditioning term”) is functionally defined (i.e., it is not a “theory”) …sorry to be so pedantic…but it matters.

      SM: But … I don’t think it will make them engage in a new, productive behavior (i.e., retracting their article). Retracting an article or publicly admitting error involves embarrassment and reputational harm — more punishment. People do behaviours less if they are punished. It’s a little bit like bullying a kid online to convince them to go to school without pants. Getting bullied sucks, but but showing up to school without pants would make the problem worse, so the only logical response is to endure the bullying. It’s using punishment to get someone to willingly submit to more punishment.

      GS: Well…you are sort of describing negative reinforcement…and it CAN “add” behavior to a repertoire. In ordinary language, if someone retracts “in order to” avoid more criticism, the process is (possibly) negative reinforcement. [I say “possibly” because the person might be following the rule “retract that paper if you want to avoid further criticism.” Following rules is operant behavior but it is traceable to the contingencies that produce rule-following…it’s complicated…I’ll leave it there…

      SM: There needs to actually be some sort of positive reinforcement for publicly admitting and fixing errors (or alternatively, a reduction of the punishment for doing so). If you want people to do something, you need to reward it, not punish it.

      GS: Well…it’s complicated and I’ve written quite a bit. I’ll just add that the reason I was pedantic is that behavior analysis, THE natural science of behavior, is poorly understood by people who are otherwise well-informed on a number of topics. That is not good given that all the frightening problems that face us are behavioral problems.

    • Put this in terms of game theory. A person does some action X and receives a “reward” which is a number that is either positive or negative. The assumption of the meaning of a positive number is that it is good, and a negative number is bad. The basic idea is something like “try to maximize the sum of all the rewards”

      If you want people to do something you set up the game in such a way as to make the outcome of doing the thing a large positive number. If you want people not to do something, you set up the game to make it so that the outcome of that thing is large and negative. If instead you simply make the outcome zero, then doing “bad” things over and over doesn’t hurt you.

      Andrew basically is suggesting that if the only outcome is positive or zero, then people will just do more of everything. The faster they do it, the more their score increases in a given time. All the bad stuff they do has zero effect on them when it doesn’t pay off, and good (positive) effect when it does pay off.

      What you want is for doing “bad stuff” to actually cause a negative outcome. Because if it’s just zero, then doing things very rapidly can not hurt and sometimes helps. Whereas, when there are negative outcomes, doing things fast and sloppy can hurt you more than help you.

      Taken as a whole, the basic idea is sound. implementing it on the other hand is a different story.

      For a very simply version, suppose that every time someone discovered an egregious sloppy error in a publication, a scientist had to pay back all the grant money on that grant, times two. Including liquidating their house and 401k and everything if necessary.

      Now I’m not suggesting this is a good idea. But I think it’s pretty clear that the rate of production of egregious sloppy errors would fall dramatically in that case.

      On the other hand, if every time an egregious sloppy error is found…. nothing happens… whereas every time you publish something you get a food pellet… then pretty clearly you’re going to find scientists publishing sloppy stuff as fast as possible to maximize the food pellet delivery rate with no concern about how sloppy it is.

      • Note that some people theorize that individuals actively try to maximize their own rewards, and this is how strategies come about. Others theorize that strategies “evolve”, that is, there is obvious selective pressure among populations. If you get a lot of grants, you’ll stay in science. If you get few grants, you’ll leave science. People are able to recognize a good strategy when they see it, and they will teach the strategy to their graduate students, and people in their departments will see that it is effective and adopt it… etc.

        I’m in favor of the evolutionary hypothesis, I think it also shows how you can wind up with sort of population-dynamics type results through time (as the funding environment changes for example).

        In my opinion, we’re seeing a situation in which bogus pseudo-science is winning out population wise as a strategy vs careful purposeful science. The driving force is a grant funding environment in which universities win by having lots of high turnover hucksters who press the bar for food pellets as fast as possible. As they seek out more and more opportunities for rent seeking they create new departments, institutes, etc hiring new hires that focus on people who play the game effectively. The “old school” scientist, like the Meehls or the Einsteins or the Paulings or Pasteurs or Lord Kelvins or whatever of the past are going extinct in the face of a new fitness environment where content doesn’t matter much but rate of button pushing and commitment signaling among peers who make decisions on funding does. Privately, scientists in Biology for example talk with themselves about how they don’t like having to lie to grant agencies but if they don’t they won’t get funding… I’ve heard to cocktail party chatter enough to know this is true.

        Science was fun while it lasted. But it’s a dinosaur now, and we’re entering the age of weasels.

      • Glen M. Sizemore says:

        “Andrew basically is suggesting that if the only outcome is positive or zero, then people will just do more of everything.”

        GS: Are you suggesting that when you act, in general, and nothing happens, you thereby keep doing it? Say I have a rat happily emitting food-reinforced lever-presses – a “positive” outcome, no? Now, I arrange it so lever-presses have no experimenter-programmed consequences (a “zero” outcome, no?)…what happens?

        • No, but if you act and there’s a frequency F with which it has a positive outcome, and a frequency (1-F) with which it has zero outcome… you maximize dGood/dt (amount of goodness per time) by maximizing dN/dt (number of trials per time)

          • Glen M. Sizemore says:

            DL: “…if you act and there’s a frequency F with which it has a positive outcome, and a frequency (1-F) with which it has zero outcome… you maximize dGood/dt (amount of goodness per time) by maximizing dN/dt (number of trials per time)”

            GS: Is this anything more than a somewhat awkward restatement of the “Law of Effect” (https://en.wikipedia.org/wiki/Law_of_effect) cast as “choice” and only 112 years too late? And “choice” with really only one reinforcement schedule?

            Notice that I am not endorsing Thorndike’s language, just the reality and overwhelming importance of operant conditioning. Do you actually have some kind of system of differential equations that is supposed to “handle” the field of schedules of reinforcement (.e., all of the possible ways to arrange a relation between behavior and other events)?

            Anyway…the link below is a simple (not that you need “simple”) tutorial for applied behavior analysts on the so-called “Matching Law,” which was developed in the basic natural science of behavior (i.e., behavior analysis). Near the end, the authors discuss the generalized matching law and what it means for single schedules of reinforcement.

            https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3357095/

            The single-schedule version of the generalized matching law also casts behavior under single schedules as a choice, not between something and nothing, but between the “main” schedule and a bunch of other non-experimenter specified schedules (e.g., grooming, looking around, sniffing etc.) that are only “slightly positive” (in your usage).

            BTW, for what it is worth, I think that the matching law predicts some of the same equilibrium points that game theory suggests should obtain in some 2-person iterated games.

            • I’m not claiming priority or anything, just pointing out some algebra. The sum of non-negative things that are sometimes positive is guaranteed to be positive, and the sum increases faster when you do more things per unit time.

              As soon as you have the possibility for a negative outcome, especially a large or a frequent negative outcome, the optimal behavior changes from “press the lever as fast as possible” to “figure out what makes the positive things happen and what makes the negative things happen, and then do mostly the positive things”

              Right now science is in the “press the lever as fast as possible” regime, and it is reflected in the behavior of PIs, which is to pump out grant applications and publications as fast as possible. Those who stop to think about it too much fall behind of those who just press the lever faster. They are very aware of it too, and their colleagues or mentors say things like “there’s a low probability that you will get this grant, so don’t put too much work into it, just submit and move on to the next grant”. This is literally the kind of thing I heard every time any grant opportunity came up in my PhD, and it’s the kind of thing that people recommended constantly to my wife as a junior faculty member. It’s just the way things are.

              • Which suggests potentially that one way to deal with the situation re grants would be to simply have a reasonably steep application fee, maybe waived for anyone who hasn’t yet gotten their first govt grant. If it costs you say $5000 or $10000 to get your grant reviewed, you’ll only put in grants that are actually worth getting reviewed. This also ameliorates the issue of large labs getting big grants through rapid fire reapplication, because they’d be returning money to the pool to give to others if they put in lots of low quality grants.

                Combine this with anonymizing the source of the grant (so that the proposal itself and not the reputation of the lab is being evaluated), and you eliminate one of the major methods of rent-seeking, which is essentially to get yourself a hotshot reputation and a bunch of cheap students from foreign countries and to just grant-mill your way into the tens of millions of dollars per year range.

                (I’m thinking mostly bio-medical field here)

        • Anoneuoid says:

          Say I have a rat happily emitting food-reinforced lever-presses – a “positive” outcome, no? Now, I arrange it so lever-presses have no experimenter-programmed consequences (a “zero” outcome, no?)…what happens?

          Eventually something good or bad is bound to happen soon after a press (eg the lab tech comes in to feed you). So you’ll have an intermittent schedule of reinforcement and the result depends on whatever “random” thing occurs.

  12. Anonymous says:

    That sort of thing is happening in social psychology where prominent researchers like Inzlicht and Finkel have earned praise for criticizing their own older work that didn’t replicate and/or had low sample sizes. Psychology is, in general, I think developing a scientific and collegial alternative approach to the “burn the witches in the town square” model. Though make no mistake, the fear of shame for substandard published work is certainly part of it.

  13. Jonathan (another one) says:

    Butera and List propose to solve the problem by not publishing (officially) papers until they have been independently replicated and making the replicators co-authors of the original paper: https://ideas.repec.org/p/feb/artefa/00608.html

  14. Dan F. says:

    Many of the examples you cite of papers containing bad statistics are distinguishable from papers published on ESP only in that the latter sometimes do interesting statistics (it’s the experiments that are fraudulent or badly designed!) One could conclude that certain “sciences”, particularly of the more social sort, are not easily distinguished from ESP research … Note that there’s nothing inherently wrong with investigating ESP except that the premise is obviously stupid in the sense that there’s no putative mechanism that withstands any kind of serious analysis … of course for much of what is published in the social sciences and etc. the situation is even worse in that there is not even a putative mechanism! At least one knows what the ESP people claim to be doing, how they claim it works, and so forth. Why should shark attacks effect elections? Does anyone bother to even bother explaining such plainly absurd ideas? (The more plainly absurd a claim is, the more burden there is on who makes it to provide a convicncing mechanism/rationales.)

    On a separate note: be careful with incentives. Much of the bad publishing occurs because of incentives (in some countries evaluating CVs by a paper count weighted by impact factor is even ordained by law). Maybe mathematics is freer (though certainly not free) of this sort of nonsense than psychology because it is easier institutionally to pursue a life of the mind as a mathematician. This is because the evaluation structure is inherently hierarchical. So few people understand any given subarea that the opinions of genuine experts dominate. Outsiders can’t even pretend to judge the quality of something because when they open their mouths they very clearly have no idea. We can all opine about shark attacks and voting procedures but very few of us, even professional mathematicians, can say anything intelligible at all about the the use of minimal hypersurfaces in proving positive mass theorems. This sounds elitist, and sometimes it is, there are always politics, but intellectual discovery is elitist in its most basic premises. Structures which leave judgments and evaluation to those competent to make them are the most effective in practice. In some areas the filtration required is much harder to achieve. Identifying the experts is complicated in terrains where raw intellectual power is less decisive than it is in areas like mathematics and physics, or where intellectual progress and popularization are easily confounded.

    • “Be careful with incentives” pretty much sums it all up. The big problem in science right now is in essence regulatory capture. Want to get a job Wansinking ? The main task is to build up your reputation through publishing hype, and to then get hired by your friends at University of Big Reputation where you finally complete your fully operational battlestation by leveraging a few small grants into many many big grants by pressing the granting lever over and over again while prominently displaying your large collection of previous grants, publications, and university letterhead.

      It’s Bower Birds all around

      https://www.youtube.com/watch?v=GPbWJPsBPdA

  15. Dale Lehman says:

    If you think economists have it right, you might want to look at this: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2946169. I will admit that I have not read the paper – and it might actually be very good – but it has all the earmarks of the best of social psychology, business school TED talks, etc. It sounds like there will be plenty of forking paths, p values, etc. in this one. But, again, I have not read it so perhaps I am misjudging this book by its cover.

    • Andrew says:

      Dale:

      Yes, on first glance (I too have just read the title and abstract), it looks ridiculous. And economists have been known to publish bad papers in PPNAS too, such as that notorious air pollution in China paper. Overall, though, my impression is that high-profile econ papers get scrutinized. For example, I can only assume that the paper you link to will go through the wringer in the review process before appearing in any econ journal.

      Econ at least has the tradition of robust criticism. In contrast, there are psychologists who seem to consider it “bullying” to publicly criticize a published paper, or to attempt an independent replication without running it by the original authors first. In econ, the norm is that any published claim is fair game for comment, whereas in psychology there often seems a desire to suppress all negative feedback or to restrict such feedback to private channels so that outsiders won’t be aware of the skepticism.

      Say what you want about economists, but if you point out an error in their papers, they won’t call you a terrorist.

Leave a Reply