Scientists behaving badly

By “badly,” I don’t just mean unethically or immorally; I’m also including those examples of individual scientists who are not clearly violating any ethical rules but are acting in a way as to degrade, rather than increase, our understanding of the world. In the latter case I include examples such as the senders of the spam-email study (who, I assume, feel that Institutional Review Board approval has resolved any ethical concerns they might’ve otherwise felt regarding wasting thousands of people’s time and degrading the level of trust between faculty and prospective students) and the authors and promoters of the notorious ovulation-and-voting and ovulation-and-clothing-color studies and the perhaps now-near-forgotten beauty-and-sex-ratio study (who, I assume, feel that statistically significant p-values have resolved any ethical concern they might’ve otherwise felt regarding the presentation of noise as scientific discovery in scholarly journals and the news media). These people are behaving badly (in my view) even if they’re just following the rules they’ve been taught (and which continue to appear, at least implicitly in statistics textbooks).

I bring this up because people sometimes ask in comments why I spend so much time picking on research and reporting that is dishonest, deceptive, or simply misinformed. People also ask this of Thomas Basbøll (for example, here). I can’t speak for Thomas here, but for me, I have two reasons for harping on published errors and misconduct, besides the very basic motivation to get the story straight (as the saying goes, “someone is wrong on the internet”).

1. Statistics. Certain studies are dead on arrival because of statistical problems. In short: not enough data, too much noise relative to the signal. John Carlin and I discuss this in our forthcoming article in Perspectives on Psychological Science. The other side of this, as you all know so well by now, is that in the absence of a strong signal, statistical significance typically doesn’t mean much. Banging on these crap studies and considering counterarguments has given me insight, into these statistical problems and also into how it is that researchers continue to use and defend such problematic methods.

2. Social science. People behave badly all the time, but rates of crime and mischief vary a lot (as any resident of NYC during the past half-century can tell you). As a social scientist, one reason I like to look at misconduct is that I’m interested in how it becomes more or less acceptable in some settings. Karl Weick and Ed Wegman copied without attribution and got caught, but they didn’t lose their jobs or anything close to that. Plagiarizing journos get slammed, but there’s not so much concern when columnists refuse to retract errors. And so on. What’s interesting is not so much that people have the capabilities to break the rules, but that in some settings they do break the rules.

3. The lure of over-certainty. OK, fine. But why am I lumping together #1 and #2? On one side we have people like David Brooks, Greg Easterbrook, Ian Ayres, and the authors of a few zillion papers in Psychological Science—these people might be a bit innumerate and a bit too resistant to admit they could make a mistake, but each of them in their own way is trying his or her best to learn the truth, within the inevitable constraints of careers and ideology. On the other side are the out-and-out deceivers: no need to name names here, just consider the various unrepentant plagiarists, data-hiders, and data-falsifiers who we’ve discussed over the years. Why the connection? Why discuss an honest (I assume) but misguided researcher such as Daryl Bem in the same breath as some, ummm, I dunno, some disgraced primatologist?

To me, the connecting thread is the un-statistical demand for certainty.

The legitimate scientists and journalists who are trying their best but stubbornly refuse to admit the possibility of error—they just think the world is a bit simpler than it is. They’ve fallen for what Tversky and Kahneman so memorably called “the law of small numbers” fallacy—the idea that patterns in the population will reliably reproduce themselves in a sample, even if the sample is small, unrepresentative, and measured with lots of noise.

On the other side, the liars, the cheaters, and the corner-cutters also exhibit this fatal flaw, over-love of certainty. If you really believe hypothesis A, and you really believe that dataset B should cleanly reflect hypothesis A (irrespective of sample size, representativeness, and measurement error), then if B disagrees with A, the natural step is to fix B until it’s correct. Or to just make B up. If you’re communicating qualitatively, it can help to hide the provenance of a story so you can make it say whatever you want it to say.

Some people follow the rules, some people don’t, and I’m not trying to make a statement of moral equivalence here—being mistaken and even stubborn is not the same thing as lying, cheating, and stealing. What I see is a connection at the level of statistical reasoning, and I think a good dose of Tversky and Kahneman could do all these people good: the honest journalists and researchers would recognize that mistakes happen and that careful measurement is important, and the crooks might realize that what they’re doing is playing a game, not doing some purer version of science.

And this helps us understand various intermediate cases such as Ron Unz (who held fast to his false statements about ethnicity and high school achievement, even after they’d been definitively shot down and even after Unz himself admitted that one of his headline claims came from “five minutes of cursory surname analysis”) or Neal Anderson and Deniz Ones (who refused to back down after some other scholars found a perfectly natural coding error in their published article). To me, Unz and Anderson/Ones are on the borderline: they seek the truth, but when their discoveries turn out to be wrong (not to worry, this happens to all of us), they can’t handle it.

This is in the Zombies category for a reason

I’ve written all this before but the question keeps coming up so I thought it was worth saying again. At this point, some of you might comment that you’ve heard it all from me before but you still don’t agree because blah blah blah. And that’s fair enough. If you feel that way, please tell me the comments. If I’m missing something here, or if I’m just not communicating clearly, either way, I’d like to know.

There’s also the larger question of whether I should be doing this at all, instead of working on Mister P, or working on Stan, or studying public opinion, or helping out my colleagues in public health, or even playing with my kids (which, as you may recall, is Greg Mankiw’s preferred leisure option when taxes get so high that he’s no longer motivated to write newspaper op-eds). These are all good questions, and they all relate to opportunity cost. I’ll save the discussion of opportunity cost for another time. For the present purposes I’d like to to think of blogging as part of my portfolio of work activities and to evaluate the posts based on their contributions without comparison to what else I could’ve done during the previous hour and a half (as that is in fact the length of time it took me to write this post, from 1:25 to 2:55 on Monday, 22 Sept 2014, after lunch and before my scheduled 3pm activity).

It’s a matter of life and death

Shortly after writing the above, I happened to read this horrifying news article by Nicholas Schmidle about a guy who’s in jail for decades after being framed by the Chicago cops for a crime he did not commit. This is a case where unwillingness to admit error has clear and direct consequences—an innocent man in jail and a killer on the loose who murdered two more people before he was finally caught!

And, sure, there are seem to be some flat-out evil people involved here, but what is relevant to our general theme above is that the system as a whole seems to have low tolerance for ambiguity and a high tolerance for putting innocent people in jail and letting killers run loose. The demand for certainty is literally killing people.

P.S. After encountering the Chicago-cops example I was going to retitle this post, “The psych department’s just another crew” in homage to the line, “The police department’s just another crew” from the rap, “Who Protects Us From You.” But, just to check, I googled that KRS-One rap and it turns out it does not contain that line! It’s funny because it’s not a memory thing—when the album came out and I heard that rap, I registered that line, which I guess KRS-One never said. It fits the rhythm and sense of the rap and I somehow just stuck the nonexistent line into my head, where it lodges to this day (indeed, I think of it every time I’m on my bike in the city and see a cop car, which isn’t quite every day but it’s often enough).

P.P.S. Just to be clear, I’m not myself saying that the police department’s “just” another crew. Sure, they’re a crew but they do good things too, it’s just a line I (mis)remembered that resonates.

56 thoughts on “Scientists behaving badly

  1. I think that you are a real-life superhero freeing us from the oppression of p-values generated by the careless, the ignorant, and the ideologues. Pick out a costume! Of course, every superhero has a vulnerability….

      • I am, as far as I know/remember, one of 7 people who has seen Jon Gruber give a talk at a comic book store.

        The others are an econ professor/blogger, a health law specialist, a crank, a girlfriend, a random guy, and the proprietor. Also there was a cat, which the girlfriend claimed all comic book stores have (not being a comic book nerd, I cannot confirm that). I believe Jon sat on a three-legged stool.

        The best part of the comic book is when all the characters have a heart attack on page 5 or so.

  2. I’m not so impressed by the quote from Levitt:

    ‘Within
    the field of economics, aca-
    demics work behind the
    scenes constantly trying to
    undermine each other. I’ve
    seen economists do far worse
    things than pulling tricks in
    figures. When economists
    get mixed up in public policy,
    things get messier.”

    You know who else get mixed up in public policy? Politicians. You know who else pull tricks in their presentations and work behind the scenes constantly trying to undermine each other?

    I could continue, but I’m not sure what Levitt’s point was supposed to be. It’s not as if public policy will be *improved* by removing experts and scientists from the field of debate.

    • Stringph:

      That’s what happens when economists start trying to write like e. e. cummings!

      Seriously, I followed up on that Levitt statement here. He tells a story of a paper of his on the safety of child car seats that got rejected by a medical journal, and he characterizes the rejection as “scientists [who] would try to keep work that disagrees with their findings out of journals.” But, after reviewing the paper in question along with the medical literature, statistician Joseph Delaney made the case that it was reasonable for that paper to have been rejected on legitimate scientific grounds. As he put it, “It is very hard to publish a paper in a medical journal using weaker data than that present elsewhere.”

    • Oh, I could tell you stories of academic conspiracies both pro-Levitt and anti-Levitt …

      Even as a critic of Levitt going back to the previous century, I have to admit he has a point about professors who are out to get him. He’s not being paranoid.

  3. Until people lose their funding/jobs for falsehoods or techniques that will lead to falsehoods the optimal strategy is to claim everything, concede nothing, and when all else fails allege fraud. There is a self-selection problem, also, or maybe it’s just a Nash equilibrium. Those who actual believe in doing things correctly go into hard science, those who rely on a low animal cunning go into the softer sciences. This phenomenon can have an influence within a single individual–your statistical papers are typically titled “Evidence in Support of…” whereas your political science papers have titles such as “The Myth…”. You should do a textual analysis of just the titles of your statistical versus political science papers and see what signifiers pop up–you can probably find a comparative lit grad student to run it through a standard software package used in that field.

    Anyway, I think you exposing bad analysis is a good thing–just wish you would extend it to political science also–social psychologists are an easy target.

    • Numeric:

      I dunno about this. I went to my published papers and searched on “myth” and found only one article, which was a review of a book by somebody else. Then I searched on “evidence” and found only three articles, two of which were on social science and the other of which was using the term “evidence” somewhat ironically. And of course I do write critically about political science; see, for example, here and here and here. I’ve been known to criticize economists too, on occasion. . . .

      • Your search missed “The Mythical Swing Voter”, but you did qualify with “published” and this may not be published as of yet. You do deserve kudos for actually attempting a search. I did look over your link of published papers–I hadn’t realized how extensive your vita is. Well, fox/hedgehog.

    • “Those who actually believe in doing things correctly go into hard science”….

      well, perhaps but I think a lot of other people go into hard science too. To me it seems pretty hard to do all of the following in todays Academic environment:

      1) Do work that matters to society, either directly, or indirectly but in plausible and tangible way.
      2) Do work with good methodology, careful controls, and without relying on typical shortcuts used in your field.
      3) Get consistent funding and be considered productive by your department in the short-term so that you obtain tenure.
      4) Continue to do good and important work after tenure.

      There are plenty of people producing lots of papers about their ultimately irrelevant specialties which are consistently funded because they play into buzzwords or publish noise as if it were meaningful thereby being considered “high productivity” (by the insanely stupid but standard publication rate metric). Even people who are good scientists can get sucked into the latest hot topic without thinking much about whether it’s really meaningful.

      My whole life I’ve had a love-hate relationship with academia. There is plenty that’s good in certain groups or sub-fields, but there’s a lot of pure unadulterated hogwash in “hard sciences” too.

      • Often this stuff is about pushing some technology, be it quantum-dots, personal genomic sequencing, widespread sensor networks, or massive computing systems, throwing globs of biological data at machine learning algorithms, or one of the latest buzzwords we hear around here “Big Data” and its relationship to pretty much anything: healthcare, traffic management, energy distribution, the military, or whatever. These are topics that attract funding, and if you don’t follow the funding you don’t stay in business as a “hard scientist” (unless you are either lucky, or have academic-political clout typically for having discovered something important early on)

        But my intuition is that a lot of the really valuable work is being done on a shoestring: guys studying how friction works at the micro-scale on surfaces, with one grad student, people who take a non-sexy but widespread medical disorder like asthma and find a new way to understand it in terms of molecular signalling between specific cell types, or like my wife recently heard a talk about how rheumatoid arthritis seems to be really a disorder of the lymphatic system…

        Doing that kind of research typically means you have a slow publication rate, very spotty funding, and ultimately are either going to be the person who actually discovers something of lasting value, or lose your lab due to funding problems and wind up taking a corporate job for Lockheed or Roche or take on an administrative role like undergrad teaching coordinator or something.

        So, I don’t think we can separate into “hard sciences” where “real careful work gets done” and “soft/social sciences” where “people cut corners and do shoddy work” or some such dichotomy.

    • “Those who actual believe in doing things correctly go into hard science”

      Let me re-phrase “Those who actual believe in doing things correctly don’t survive in the social sciences”

  4. Your misremembered lyric reminded me of the following lyrics from Murs’ “The Night Before”.

    “But it’s Thursday in my hood when they sweep them streets
    A whole fleet of the task that they simply call CRASH
    That’s Community Resources Against Street Hoodlums
    If anyone should ask what the acronym reflects
    Put into effect to try to keep the gangs in check
    Now they’re just another gang out bangin’ they set
    Known for stirring up some shit when your hood is at peace
    The only pig I know dying to create beef”

  5. Andrew,

    How will academics in social science get tenure unless they & many others subscribe to the Law of Small Numbers?

    Few get tenure by taking existing “neat” effects and showing that with more data the effect goes away.

    The entire field of priming experiments seems to live/thrive on the Law of Small Numbers.

    • “Few get tenure by taking existing “neat” effects and showing that with more data the effect goes away.”

      Crap…. that’s half my in-progress research agenda! (like, not totally for real, but kinda for real).

      Ummmm… Andrew – would you be willing to meet me (Jonathan Clark/Jorge Castillo/Jiang Chao/Jamal Curtis) for a meeting about starting a PhD in Statistics at Colombia? Because apparently I am in real trouble here.

      • I think Michael Webster is supposing that it’s the same person discovering the “neat” effect and then discrediting it. It’s quite possible that in Econ a person could get tenure by showing that other economists are chasing noise.

  6. Another factor, in some fields, is the possibility of other financial incentives (beyond tenure). If you get five figures for a keynote on your lab’s years of ground-breaking work into the applications of effect X, plus the revenue from the associated books and self-help courses, how hard are you really going to be looking for disconfirmation of X, or to listen to multiple independent reports that X cannot be reproduced? This doesn’t have to imply outright fraud — probably you sincerely believed in X at the start, and you got a positive result when you looked — but there comes a point where, as a scientist, you are morally obliged to start taking all those negative results seriously.

  7. >” If I’m missing something here, or if I’m just not communicating clearly, either way, I’d like to know.”

    …well, the possible communication problem may be on my receiver node — but the blog post appears overly long and meandering.

    You seemingly imply a valid main point about human nature in general — but oddly do not say that; instead diluting that point with a somewhat disjointed mix of sub-points and examples. Perhaps helpful if the main point would be distilled to one leading sentence… and the support concisely grafted in a couple of paragraphs.

    “It is my ambition to say in ten sentences what others say in a whole book.”
    — Friedrich Nietzsche

    • Leonard:

      Fair enough. You (or anyone else) should feel free to re-express what I’ve said more pithily. I’d be happy if in these posts I’ve provided the raw material for someone else to write something clearer.

  8. Okay, but it seems like you are skipping over the most flagrant and far-reaching ethical and moral problem in the social sciences, which is people having their careers wrecked, pour encourager les autres, for telling the truth about statistical data, without enough other social sciences coming to their defense: for example, James D. Watson in 2007, the partial example of Larry Summers, and Jason Richwine in 2013.

    Richwine lost his job because his Harvard Ph.D. dissertation, which had been approved by three heavyweights including Christopher Jencks (who is probably the most distinguished American left-of-center social scientist of my lifetime) told unwelcome truths.

    Relative to this kind of penalizing the honest for being honest, most of the other issues in statistical analysis are minor.

    • Steve:

      I see your point. To the extent that researchers are being silenced or feel intimidated to speak out, this could be viewed as another manifestation of a desire for certainty. It’s complicated because sometimes the purveyors of shoddy research will defend their work on the grounds that it is politically incorrect. They often seem to be engaged in a search for certainty of their own, following some sort of gender or ethnic essentialism that often seems to go far beyond any evidence they might have.

      Where does James Watson fall along this spectrum? I’m not sure. On one hand he’s reporting a mix of unremarkable facts and personal opinions, and a scientist should be able to feel free to report such things, especially given that Watson seems to have been pretty clear at specifying when he’s stating facts and when he’s giving opinions or reporting personal experiences. On the other hand, he said some pretty rude things and it’s not so clear it’s appropriate for him to say this sort of thing as leader of a large organization.

      Was Watson’s career “wrecked”? I don’t know. He was way past the usual retirement age, and so he’s no longer running that lab. If he wants to write a research paper, nobody’s stopping him, and I’m sure lots of journals would be willing to publish what he writes, if they think it’s sound science.

      • Let me try to offer a general perspective of mine on the social sciences, which is that the bigger problem is not what social scientists publish but what they don’t publish.

        Overall, I’m a fan of the social sciences. I’ve been involved in deflating some bad theories that got published over the years but I think most of what gets published (that I hear about) ranges in quality from OK to good.

        Yeah, there are silly ideas that get out into the world of Malcolm Gladwell articles and TED talks, but untrue findings tend to be self-limiting because it’s hard to come up with other examples. For example, the whole world wanted to believe that you could mold yourself or your child into anything via 10,000 hours of practice, but it’s the kind of stupid idea that burns out pretty quickly for lack of evidence.

        In contrast, if somebody comes up with a true insight, it tends to connect to a lot of other truths.

        Mostly I think the bigger problem in today’s social sciences is what doesn’t get studied out of fear of what the truth will turn out to be.

        For example, immigration. It’s an immense phenomenon and a topic of extreme policy importance. And yet strikingly little research gets published on immigration, especially the hugely important topic of pathways over multiple generations.

        Why not? Consider the career of Jason Richwine. He did his Harvard Ph.D. dissertation on this important subject, got it approved by Christopher Jencks, among others, and what happened? He got kicked out of his job at a conservative think tank when political enemies revealed the very existence of his crimethink. He didn’t get in trouble for being wrong, he got in trouble for doing research that everybody fears is right.

        And practically nobody spoke up for Richwine.

        • I just looked at his dissertation. I can’t believe that a university like Harvard could give a PhD for this garbage. It was an eye-opening experience for me to read this dissertation. Andrew, I take back everything I said earlier this morning. You should go after all kinds of bullshit “research” like this. It’s insulting to researchers worldwide to even call this dissertation “research”.

          This dissertation would be acceptable as a master’s thesis from an MSc student. For such an important topic with such huge policy implications and potential for political use and misuse, I would never allow someone to get a PhD for summarizing the results of other people’s studies and fitting a bunch of linear models, without gathering any new data, and most importantly, without providing any estimates of uncertainty.

          On this web page, they quote Richwine:

          http://www.washingtonpost.com/blogs/wonkblog/wp/2013/08/09/jason-richwine-doesnt-understand-why-people-are-mad-at-him/

          “my dissertation shows that recent immigrants score lower than U.S.-born whites on a variety of cognitive tests. Using statistical analysis, it suggests that the test-score differential is due primarily to a real cognitive deficit rather than to culture or language bias.”

          I doubt very much that you can show that “using statistical analysis”. Here’s his PhD:

          http://delong.typepad.com/pdf-1.pdf

          Four datasets apparently show that the mean IQs of Mexicans is in the mid 80s, other Hispanics are in the low 90s, Europeans are in the upper 90s, Asians are in the low 100s. White natives are the baseline at 100.

          1. The estimates in one paper come from 40 people or more in each data set, and the says coyly that the data “inevitably vary in test quality, sample representativeness, and year of testing” (p. 26). No kidding.. Hell, even the red and sexual availability people were able to do better than that.

          2. The estimates are “normed to white native distribution of intelligence” with a mean of 100 and sd of 15. No estimates of uncertainty are provided anywhere, just means.

          3. There is a lovely sentence on p. 29 (I add some explanation, but this is nearly verbatim):

          “The reduced form wage equation—log earnings regressed on age and national IQ scores (remember, these are single values for each nation)—lacks controls for education quality, home environment, and neighborhood effects, which are inevitably correlated with IQ. Introducing those controls would attenuate the predictive power of IQ.”

          Well, how about that. I should also remove controls in my studies when they attenuate the predictive power of my variables (a predictive power that is never quantified).

          4. He talks about digit span memory capacity tasks done on different populations (not done by him–his entire dissertation seems to consist of reporting other people’s published work and fitting a bunch of statistical models on the data there). Sample size from each group is 100 or so. This is in table 2.12. (BTW, the table should have been marked as being taken from Wechsler 1991. Someone reading the diss quickly could easily be misled into thinking this was the author’s work.). This seems to be the basis for the claim that it’s deficits in inherent cognitive ability that lead to low IQ. It is pure fantasy to make such a wild claim based on so little evidence.

          5. Appendix A is basically a table of country-wise scores, just averages, no estimates of uncertainty.

          6. There is not a word about model assumptions being checked anywhere that I could find.

          This work is definitely not an example of researchers “telling the truth”, unless telling the truth means summarizing previous findings by other people and getting a PhD for it.

        • Thanks for doing this analysis. Reading the dissertation, I am shocked, too, that this is the sort of work that gets you a Harvard PhD. “Public policy” appears to be expertise in creating a case for your favorite hobby horse that can impress ordinary people with no statistics understanding. Basically, a propaganda degree.

        • Mary:

          When discussing this case last year, I wrote the following:

          Flawed Ph.D. theses get published all the time. I’d say that most Ph.D. theses I’ve seen are flawed: usually the plan is to get the papers into shape later, when submitting them to journals. If a student doesn’t go into academia, the thesis typically just sits there and is rarely followed up on. I don’t know the statistics on this, but I’m guessing it’s a typical pattern for a policy school Ph.D. to go into the policy world, not academia, and so then the details of the thesis won’t be taken so seriously. At some point, the goal is for the student to graduate, it’s not required that the thesis have all its holes plugged. That can be done in the submission-for-publication stage, if that ever happens.

          Maybe such theses should not be approved, but my general impression is that, once a doctoral student has reached a certain point, everyone has a motivation for him to just graduate already.

        • No matter how crappy the work? There’s a difference between having faults versus having the core so weak as to vitiate the key conclusions.

          No one is demanding absolute watertight correctness. But I think Andrew’s attitude is nihilistic. If we persist the quality of PhD’s is going to end up in the gutter (if it already isn’t)

          Yes, the vetting process may fail at times but let’s acknowledge that as a flaw. No point pretending a bug is a feature.

          Finally, from an entirely pragmatic viewpoint: Suppose a grad student has stayed in the PhD program for too long & you want to get him to graduate, sure let him, but with a weaksauce thesis whose conclusions are vague or just going with what’s already the norm. Basically a thesis no one’s going to ever care about.

          Letting crappy evidence get a PhD thesis with some provocative, strong conclusion people is just asking for trouble.

        • My impression (based on serving on Ph.D. committees in several fields) is similar to Andrew’s: Often the Ph.D. thesis has some flaws, but the thesis may never be published. If it is to be published, the problems should be patched up before publication — but often they aren’t, and aren’t caught in the reviewing process either — or the article may be submitted to several journals serially until it is accepted in one.

          My experience with biology is a bit different: There, the chapters of the dissertation are typically different pieces of work, and two or three are at least accepted or even published before the student finishes the Ph.D.

        • Just because this is an Internet forum does not mean it is acceptable to lie about another scholar’s work. It is absurd to describe my dissertation as “summarizing previous findings by other people.” You think that analyzing data sets such as the NLSY and NIS does not constitute original research because I did not personally conduct the surveys? These are large publicly-available data sets designed to facilitate original research of the kind that I carried out. Is every paper based on say, the NLSY, unoriginal in your view? I question the scholarly integrity of anyone who would mock sentences out of context and mischaracterize arguments so recklessly. Do you do this in your academic papers as well?

        • Jason,

          Looking over your dissertation, I agree that Shravan mis-represented your work. There is original analysis, and using large-scale, publicly available datasets is obviously a perfectly respected and useful thing to do in social sciences. In particular, regarding his comment, point by point:

          1 is a rhetorical argument that lacks much content – acknowledging limitations in your data is important.

          2 is true. The lack of measures of statistical precision (standard errors, confidence intervals, p-values) is a pretty serious problem in that we have no way to judge whether these point estimates reveal meaningful differences other than by saying “well, they seem sort of consistent in direction/sign”. That is a real failure (which, as Andrew pointed out, dissertations are allowed to have).

          3 is a mis-reading of your work, because you do, later, actually adjust for education in some manner arguing that it would work against your findings. One could argue with how you do it (by assuming that IQ differences are constant across educational level) but that is not what Shravan said.

          4 is a rheotrical argument, or actually, a rhetorical argument against your rhetorical argument. I haven’t read carefully enough to comment, but a better way of saying what he is saying might be “You make incredibly confident claims about what your data implies, and the actual data analysis does not support such certainty, because you cannot even plausibly rule out a whole host of other arguments that are consistent with the same empirical findings.” Andrew might call this confusing a statistical test with a substantive test of theory. I reiterate – I haven’t read carefully enough to take sides, but nothing you write counters that argument.

          5 is again true. I think it is fine to have those means there, but obviously including a measure of precision would be better. That said, just by itself and as an appendix table, it is fine, but adds very little for me in terms of “evidence”.

          6 I haven’t read the whole thing, so I can’t comment.

          Now – all that said, in this short reading of your work, and the one I did last time it came up on the blog, my basic reaction was that you had over-interpreted your findings, your work lacked fundamental statistical rigor in not even attempting to display information on statistical precision, and you had very weak \emph{causal} arguments regarding deep genetic differences that are not just \emph{correlational} arguments about differences across groups of people with vastly different life and educational experiences.

          I would also like to say that, agreeing with Andrew above, all dissertations have terrible parts, fundamental mistakes, and most chapters are likely to never be published. That does not make them bad dissertations.

          Steve,(since Jason did not ever say anything about the fairness/unfairness of his employment termination),

          Jason took a job at a political organization. Political organizations are effective in part by controlling their public image. They fire people who become an embarrassment to them (for whatever reason they become an embarrassment) all the time. This is fair. This is free market capitalism and democracy and all that. Jason became a national distraction and embarrassment to them. For a group dedicated to influencing people in Washington, that is basically the definition of sucking at your job. So they fired him. What exactly is the problem with that?

        • Hi Jason,

          if I were writing a *scholarly response* to your dissertation, I would spend a lot more time on studying your claims, and on trying to repeat your analyses. I would also show you how I would do it. I agree with you that there’s in principle nothing wrong with using existing data. But you are using data summaries (please correct me if I am wrong). Are you using the raw data from these published studies, along with all the uncertainty estimates those would give you? Seeing one number for Mexicans’ IQ doesn’t tell me much.

          It’s irresponsible to report your analyses without and uncertainty estimates, and without discussion about the quality of model fit. jrc points out some irrelevant or incorrect statements in my previous post—it’s likely she/he’s right, as I didn’t spend as much time on the dissertation as I might otherwise have. But a key point, that others have also made here, is that you don’t present estimates of uncertainty, and don’t talk about model assumptions, and you draw strong causal inferences from correlational data.

          You can still fix this in a follow-up journal article; why don’t you go ahead do it? Do a more rigorous study of existing data, but do it right. People might still disagree with your conclusions, but at least you would have done the best that can be done with the data you looked at.

          PS While you are at it, try to also use the same data to break your own story. Data is a malleable resource, you can often make it say whatever you wanted to even before you looked at it (and I don’t mean you—I mean everyone with a stake in a particular research problem). The argument for and the argument against could be fairly presented. You can take an adversarial position against your own claims, and see what the data then say. There must be opposing views out there, with corroboratory evidence. For such an important topic such as yours, you should present the evidence for as well as against.

        • thanks for documenting this. I had looked at it and it’s so laughably bad (methodology alone – forget the politics) I wouldn’t even know where to begin.

      • This probably indicates another “bad” of bad science. If harmless stupidity (like dress color dependence on menstrual cycle) can be published with marginal statistical support and accepted as a norm, than all sorts of really dangerous stuff (like ethnic predisposition for this or that social behavior or level of IQ) can be seriously entertained with the same level of carelessness.

    • Was Watson a social scientist? Both Summers & Watson were hardly social science controversies. They were expressing ad hoc personal opinions (whether right or wrong). Not some researchers who were lynched for stating the results of their own painstaking research.

      In any case, to be described the “most flagrant and far-reaching ethical and moral problem” I hope you have more than three pet examples?

      Just because it fits Sailer’s private agenda does not make it a critical nor widespread problem.

  9. I appreciate Andrew’s emphasis on “the un-statistical demand for certainty”. This is something that needs more attention — not just in discussing research, but also in teaching statistics. I suspect that there are many people who do poor research, but believe that they are indeed “following the rules”, because virtually their entire training in statistics has neglected the importance of recognizing uncertainty when using statistics.

    I don’t regularly teach statistics anymore, but when I did, I tried to emphasize that statistics inherently involves uncertainty. But I don’t think I had the emphasis strong enough (or perhaps not as consistently as needed). In the more informal teaching I sometimes do now, I emphasize it more — perhaps it’s easier because I am not tied down to a “syllabus.” But I probably still don’t emphasize it enough.

    If anyone is interested in examples where I try to emphasize uncertainty, here are some links:

    Lecture notes:
    pp. 2 – 9 of http://www.ma.utexas.edu/users/mks/Cautions2014/cautions2014.html
    pp. 5 – 12 of http://www.ma.utexas.edu/users/mks/CommonMistakes2014/WorkshopSlidesDay1_2014.pdf,

    Web pages :
    http://www.ma.utexas.edu/users/mks/statmistakes/uncertainty.html, http://www.ma.utexas.edu/users/mks/statmistakes/uncertaintyquotes.html, http://www.ma.utexas.edu/users/mks/statmistakes/terminologyrevariability.html, http://www.ma.utexas.edu/users/mks/statmistakes/causality.html

    Blog entries:
    http://www.ma.utexas.edu/blogs/mks/2013/10/14/signing-the-blueprints/, http://www.ma.utexas.edu/blogs/mks/2013/01/13/more-re-simmons-et-al-part-i-uncertainty/.

    I believe I am only scratching the surface in these (but feel free to use any ideas you might see in any of the above.)

    Also, if someone is interested in pushing the topic of paying attention to uncertainty in teaching statistics, it might be worthwhile to organize a panel discussion or special session at JSM 2016; I would guess that the Statistics Education section might be interested in sponsoring one.

  10. Surely *all* articles in Psych Science (or Science, Nature) cannot be bad? I think you imply that, tarring even careful researchers with the same brush you reserve for the abusers of statistics. In the hard sciences, there have been several instances of outright fraud in the past (there was a German superstar in Bell Labs or some such institution, physicist, who was making up stuff; Peter Medawar talks about the spotted mice fraud in one of his books; etc.).

    There is a lot of abuse in my field (psycholinguistics) as well: low-powered studies drawing bold null result conclusions, decades of publishing papers by an author that always confirm the author’s theory (statistically unlikely that they never found evidence against their own theory—probably they never published it), never checking model assumptions, incorrect conclusions about low p-values giving more evidence for the specific alternative under consideration, running experiments till they hit significance, doing literally hundreds of tests and reporting significance at alpha=0.05. I have personally learnt a lot from your books and critiques on this blog, and much has changed in the way that my lab works. We compute power in advance, we run as high a powered study as we can afford to, we check model assumptions, we don’t do post-hoc changes to the analyses after we fail to find an effect, we don’t run experiments till we hit significance, we release all data and code on publication, etc. So your critiques, even of silly studies like the red and sexual availability stuff, are very helpful.

    But the same abuse is happening in medicine. Even today medical research relies on p-values to make binary decisions about outcomes, for example. All the problems I list above are present in medical research. If someone like you were to direct their attention to this line of work, think about how much it would benefit the world (You wrote recently in Chance about how little statistics has changed the world compared to, say, chemistry; this could become a counterexample). By contrast, suppose that tomorrow psychologists and psycholinguists suddenly stopped abusing statistical tools and doing the best they can given their noisy data and experimental methods. Not much would change. (Although I will host a party in the coolest city in the world if psycholinguists stop publishing papers in top journals with titles like “No effect of factor X or Y”, with power of something like 30%).

    So, you should go after research in medicine. It would still help people working in “useless” fields, but it would also help medicine clean up its act.

    PS By the way, I don’t agree with you because bla bla bla (see above).

    • I agree that there’s a lot of statistically dubious, or at best just murky, stuff in the bio/medical literature, but it’s perhaps asking a lot of one blog to patrol this as well as its usual beat.

      Coincidentally, I’ve been wondering whether the set of “common” problems in bio/medical studies is the same or different than in the social sciences. In the former, ‘modeling’ is all the rage, which sadly often involves fitting sparse data sets to models with vast numbers of parameters, sometimes with no apparent realization of why this is a problem. I’m also rather tired of seeing parameter values without error bars, something no one who has an undergraduate degree in any science should ever present.

      • RP,
        My impression is that there are problems that are common to many areas of application of statistics, but also problems that vary from field to field or even from subfield to subfield. For example, my experience is that just within the biological sciences, the problems can be so different in different subareas as to require different methods, and concomitantly result in different problems.

  11. For my part, the easy rhetoric that we are “harping” about this sort of thing is probably the most annoying. Most of the time, I’m providing constructive advice about how to become a more effective writer (just as you, Andrew, spend most of your time helping people solve problems of statistical analysis.) But it’s because, as Uri Simonsohn once said, all that effort becomes meaningless if we just give fraud and incompetence pass that we must, every now and then, step up and point out a mistake, which, if nothing else, can actually be corrected.

    Do we want offenders to be drawn and quartered? Sure, sometimes they do something bad enough to warrant punishment. But mainly we just want assure ourselves that the much vaunted “self-correcting” mechanisms of science work. Unfortunately, however, that assurance isn’t always forthcoming, and we’re told we’re ranting on about “bullshit non-issues”.

  12. Your campaign is very timely – because this is the most critical problem for humanity now. Do we seek truth – or do we just want to win? Are we really scientists – or just trying to make a buck? It makes me appreciate people like Judith Rich Harris – someone willing to point out the emperor has no clothes and lead us to a more accurate view. It’s not an accident that she was outside the fold.

    The current research and publishing process is really quite broken. It’s causing more harm than good. I see some anti big data and machine learning talk here. But those are just tools that could help to fix things. Being able to collect and process large enough samples to really provide accurate views is essential to changing things for the better.

  13. Pingback: Ancora su indici e classifiche – Ocasapiens - Blog - Repubblica.it

  14. Pingback: Illegal Business Controls America - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science

  15. Pingback: Somewhere else, part 182 | Freakonometrics

  16. A related issue is the “senior scientists are never wrong”. A rule that is followed by many at my university hospital. During my time as a Phd-student I have encountered this in all my projects. If the data is not proving my fellow seniors hypothesis it is considered bad or they will blame my analysis. For some peculiar reason my fellow colleagues seems to put a lot of trust in previous studies although they might have drawn conclusions on small samples with the wrong analysis. My hopes for the medical research is very low at this moment.

    This also makes me think about the Phd-system that we have in Sweden, and perhaps in all other countries as well. A “good” PhD-student finishes his work in short time, proving his supervisors hypothesizes without any obstacles(≥2 published, 2 manuscripts). A bad student seems to never get anything right and the results fail to come. There is no reward in having a morally tuned, ethical compass in this system.

Leave a Reply

Your email address will not be published. Required fields are marked *