What are the important issues in ethics and statistics? I’m looking for your input!

I’ve recently started a regular column on ethics, appearing every three months in Chance magazine. My first column, “Open Data and Open Methods,” is here, and my second column, “Statisticians: When we teach, we don’t practice what we preach” (coauthored with Eric Loken) will be appearing in the next issue.

Statistical ethics is a wide-open topic, and I’d be very interested in everyone’s thoughts, questions, and stories. I’d like to get beyond generic questions such as, Is it right to do a randomized trial when you think the treatment is probably better than the control?, and I’d also like to avoid the really easy questions such as, Is it ethical to copy Wikipedia entries and then sell the resulting publication for $2800 a year? [Note to people who are sick of hearing about this particular story: I’ll consider stopping my blogging on it, the moment that the people involved consider apologizing for their behavior.]

Please insert your thoughts, questions, stories, links, etc. in the comments. Or feel free to email me directly if you like to tell me something in confidence.

27 thoughts on “What are the important issues in ethics and statistics? I’m looking for your input!

  1. When working in collaboration with other scientists, a statistician may feel pressure to produce the “expected results” even when the data are inconclusive are actually contradict the favored hypothesis. What are some good strategies for handling this pressure?

  2. One of the big dimensions of research might be called “objectivity versus advocacy”. At the objective end, you are seeking to describe a situation. At the advocacy end, you are seeking facts which support a certain point of view: my product is better, buying advertising on my program or web site is a good value, etc.

    Advocacy research isn’t bad, any more than PR is bad relative to journalism. It’s a different set of customers who have different needs. But it’s particularly important to realize what type of research you are doing. (I think many accademics are under the illusion that they are being objective, but are really working the advocacy side. See, for example, the Levitt-Lott debate.)

    There are probably more ethical challenges to those who work on the advocacy side — for an overly simple example, anyone who works for an ad agency. It’s important to realize that, at bottom, you are being paid to advance the agency. They are not paying you to lose a major account for them.

  3. off-topic.

    Dear prof. Gelman,

    I learnt about hierarchical models with you ARM book and later on with other Bayesian textbooks. Thus, I know what are Bayesian varying-intercept models. And I know there are varying-intercept frequentist models (so called random effects, although as you point the nomenclature is rather confusing). But here is my question: How can be that parameters are random variables in a frequentist setting, if parameters are not random variables? I never understood that and since I only learnt Bayesian multilevel models I can’t figure out myself how is that possible. Could you explain it or at least point to some reference that address this point?

    Thanks in advance,

    Manoel

    ps.: I didn’t know where to ask my question. Is it better to just email you rather than write an off-topic comment in your blog?

  4. I really liked the column you linked to and it raised a couple of questions in my mind – particularly the part about the statistical expert with a masters degree. I have a masters in data modeling and am perfectly aware of my limitations (though that doesn’t prevent me from trying to improve my capabilities). However, for some of the people I work with my statistical capabilities are seen as ‘expert’ – these things are all relative.

    For me this brings up an ethical question concerning the abuse of expertise. If I am uncertain about how to deal with some aspect of data I try to ensure that my colleagues understand this, and that I will try to determine the best way of dealing with the data. I try to do this to the best of my abilities.

    However, I imagine that it would be very easy for an unscrupulous scientist to further their own ends by referencing published material that was not necessarily reputable, but this might not be obvious to scientists from other disciplines who were relying on their statistical ‘expert’. One could (for example) reference a published article that gender was influenced by parental beauty and base claims on this, but I’m sure that there is more subtle nonsense out there which could be used to further a particular goal and it would not be detectable to a less statistical literate reader.

    This might only be a re-phrasing of what zbyciclist said – which I also agree with.

  5. Sorry Andrew,

    but “Is it ethical to copy Wikipedia entries and then sell the resulting publication for $2800 a year” is uncalled-for in this context – and to be honest, bores me to a great extend by now. Don’t get me wrong, I don’t share any sympathy for plagiarism and a “mea culpa maxima” by Ed would be desirable, but we definitely won’t get there by constant pushing on the matter.

    Ethical considerations for statistical analyses are always matter to a responsible interpretation within their context. That said, there are no ethics in statistics but only ethics in medicine, agriculture, manufacturing, … Going one step further, ethics need some system of values to build upon and it surely matters, if we use a system defined by Wall Street or based on Jewish Christian tradition.

    Martin

    • Martin:

      I don’t know why you think it’s uncalled-for to bring up plagiarism in a discussion of ethics. In all seriousness, I do think it’s unethical to copy Wikipedia entries and pass them off as original work, and even more unethical to charge for it.

      • Hi Andrew,

        in my view plagiarism is unfair, illegal and in many cases harmless to others. Ethical considerations apply probably mostly on a higher level where decisions and policies affect a larger group of people – thats where statistics usually applies.

        But more specifically I found your comment on the WIREs publication – where Ed happens to be the editor (is he still in charge?) – unfortunate, because:
        – The vast majority of the papers published in WIREs Compstat are original work
        – Ed is certainly not cashing in all of the $2800 subscription fee
        If your intention is to aim for Ed to come to terms with that matter and consider apologizing for his behavior, you should be correct in your judgement and should avoid discrediting other authors or exaggerating on the financial benefit for Ed.

        In the end, once Ed would come up with a serious apology for his guilt, it would be up to us to show forgiveness.
        At least in the Christian occident, this is a fundamental ethics.

        Martin

        • Martin. An ethical violation can still exist if their are two parties to the action. In this case you have one party (Wegman) who plagiarizes and lies about it, and another (Wiley) that profits from it.

          You say that plagiarism is in many cases harmless to others. In this case, the people harmed include anyone who attempted to learn from the articles in question–recall that errors were introduced in the copying process. Also the students who took classes with Wegman collaborator Said, whose hiring was, I assume, based on a CV that included multiple plagiarized papers, also the taxpayers of Virginia and whoever else is paying Wegman and Said’s salary, and the libraries that paid $2800 for a journal that includes plagiarized material, and the other students who have worked with Wegman, and . . .

          I agree with you that it would be a good thing for all the people involved to apologize and make restitution.

  6. The point that Tom mentions above is one that hits close to home for me as well. While I have MA degrees in Statistics and Economics, my PhD is in another field. It makes me extremely hesitant to call myself either a Statistician or an Economist. Those at my level of statistical knowledge should KNOW the distinction between applying statistics in a stats package like R and STATA (closer to what I do) and developing statistical theory (philosophical or directly addressing issues in the methodology, etc.). I think I’m pretty good with data analysis, but those who consistently publish in journals like Econometrica and Annals of Statistics are another breed (I mean that in a good, awe-inspiring way :-))

    In a field like mine, it is often the case that someone who can run a logistic regression is the departmental statistics expert. That is not so much the case in the department I am currently in, but is more of a general trend I see at conferences.

    Perhaps this issue, and communicating it to fellow researchers with less background, is one that can be raised in its own article? For example, how to explain shortcomings without offending colleagues?

    • Hmm. My degrees are in biochemistry, chemical engineering, and biomedical engineering, and I don’t hesitate to call myself a statistician…

      • Depends on the context of course. But I am always interested to hear what others think of the distinction and if that matters when referring to oneself as a statistician (usually implies ‘expert in statistics’, which again results in an ‘eyes of the beholder’ issue with respect to expertise).

        • Along these lines, what about mentioning in a letter to medical journal that one is a Fellow of the Royal Statistical Society. In the case I encountered, I assume this was true, but in a convenience sample of medical researchers, they all though it indicated a professional qualification, like the C.Stat., or an honor, like Fellow of the ASA.

  7. One question that always fascinates me about any profession or job is: Are there any specific positive ethical responsibilities or practices that one assumes by taking on the role of statistician?

    Lawyers and doctors, for example, have specific ethical obligations that others don’t have. Some financial/investment roles have a fiduciary responsibility that goes beyond merely not stealing the money, but includes the notion of stewardship.

    Does a statistician have a duty to “stop and render aid” if she sees a statistical casualty?

  8. Adjusting prior distributions until the “best fit” or some model selection comes out how you want?

    Showing only the one piece of data for which your model is a good fit?

    Only looking for problems with your data analysis when your results come out different than you expect? (This bias has some name, I suspect.)

    Like with data: Not releasing your *code*.

    Not being as critical as possible about your own results — this is generic to all of science — and may apply to the sex differences you blogged about today.

    How to address *others* who mis-use, over-interpret, or incorrectly re-analyze your results; how much responsibility do we have for that?

    • My view is that looking at different (and reasonable) prior distributions is legitimate if you are studying the rubustness of the analysis; it is definitely out of bounds to pick your prior to get the posterior that you want.

  9. I’m not sure I agree with handing over data willy-nilly. There may be some ethical reasons why you wouldn’t want to hand data over e.g. if a tobacco company wants my data and gets their high school intern to analyse the data and they show that smoking isn’t harmful. Now if they get that in the press then that might put off smokers from stopping. Those smokers are never going to see that a better analysis showed something different if those results only appeared in some obscure journal.

    ~~~~~~~~~~
    An ethics issue.
    I know a case where a researcher realised that some published results seem flawed but couldn’t get access to the data but managed to get it through “contacts”. He showed that the results were flawed and this was published (interestingly the original authors got the last word and minimised the error as if it was of no real importance). But do the ends justify the means in this case?

    As a person who relies on people donating their data I wonder if people will donate their data if it’s o.k. for people to steal it even if it is for the greater good? Because how will the donors know it’s for the greater good anyway?

    • mpledger,

      Re. data sharing with the likes of tobacco companies – I’m writing a paper on this at the moment. It’s a tricky question, I think. On the one hand I am all in favour of a norm of very wide sharing of data; as in, I think sharing should be the default behaviour unless there are very good reasons not to.

      On the other hand, you have these mercenary ‘reanalyses’ being performed in the interests of particular groups, ‘published’ by press release or just handed by lobbyists to policymakers. And these tend to be unduly influential – as you say, any other reanalysis would appear in the academic literature; if indeed a proper reanalysis is ever done, since there are few academic incentives for doing that kind of work. In this situation, it’s hard to see how either the scientific enterprise or the public interest is served by sharing data. But the question is how to design guidelines for encouraging wider sharing while also allowing a sort of ‘bad faith’ exception.

  10. This is a really interesting question to me – I’m an ethicist (PhD in moral philosophy) currently retraining in statistics (Masters in Biostats), and there seems to be a lot of unexplored potential in the intersection between ethics and statistics.

    I agree with William Ockham above that the question of professional role responsibilities is a very important one (not just for statisticians!). In teaching medical ethics (my current bread and butter job) or legal ethics, we emphasise that the power and privileges that come with the possession of expertise also carry responsibilities to use that power for the (broadly construed) benefit of society. In the case of statisticians, this raises questions about the kind of work we agree to do – e.g. should we help design clinical trials that have little medical or scientific rationale but are intended to serve commercial purposes? What responsibility does a statistician have for the results and consequences of research for which they have provided statistical tools?

    There are also more obvious issues to do with responsible practices (running a bad-faith analysis designed to get the desired results, and so on).

    I’m also quite interested (though in a rather inchoate way at present) in the broad question of how very complex analyses come to inform public policy, and in how policymakers and the general public have to make decisions on the basis of research they can’t possibly fully understand. What role should statisticians play in informing the normative issues?

    I think there are also a lot of epistemological questions that shade over into ethical questions, particularly when we start talking about how the results of a statistical analysis are represented.

  11. There was a funny paper in a psychology journal a couple months ago: False-Positive Psychology

    It highlights one of the things that has vexed me as long as I’ve been doing data analysis — namely, the temptation to keep running analyses until the magic p<.05 light goes on. In psychology we used to call this "capitalizing on chance." The frustrating thing is how wide the scope is — we talked mostly of multiple comparisons (which I know you have other ways of handling, Andrew) but it seemed to me the problem existed in a zillion other undiscussed places — like the game of Covariate Go Fish discussed in the above article, or the game where your stopping rule for recruiting is "p<.05," or the game where you keep tweaking an experiment until you see the expected effects and then only report on that version. All of these have the tendency of adding to the literature information that isn't "really" true, I fear, but the pressure to play at least some of these games is high. This might not itself be a huge issue but for two other epistemological problems: 1) the impossibility of publishing null results and 2) the disciplinary culture of avoiding pure replication. It had the effect on me of making me worry my whole field was nonsense, which contributed to my going back to school to study statistics more in-depth, where I hoped more people would be talking about these issues. Surprisingly to me, not many are!

    This may or may not be helpful to you, because I don't know how many readers of Chance will be working in psychology and I don't know how well the problem generalizes to other fields that use statistics — maybe this is why it hasn't come up in my courses yet. I do sort of suspect some of the issues carry over to epidemiology, where AFAICT it's very common to hire stat consultants who might be in your audience (that's the sector I'm working in now, in fact). But my experience beyond those borders is really limited.

  12. A classic example of ethical issues in statistical analysis is John Lott’s “proving” that “more guns” leads to”less crime.” Tim Lambert’s blog http://scienceblogs.com/deltoid/ lays out the gory details: using lousy data (that had a break in series which was ignored); supposedly losing data; and changing the model when his revised findings didn’t comport with his beliefs.

  13. Something that impressed me back in the 1980s was the “Ethical Guidelines for Statistical Practice” in the American Statistician 1983 vol 37#1
    Preamble:
    The American Statistical Association is a scientific,
professional, and educational organization. As such it
 recognizes that the professionai lntegrity of statisticians 
is dependent not only upon their skills and dedication
 but also upon their adherence to recognized principles of ethical behavior. Wherein statistics as a science strives toward truth,these guidelines are designed to provide a measure by which both individuals and organizations can avoid compromise of truth and can be protected from the misuse of statistics and statistical data.

    It’s fascinating!

  14. Nice of you to take this on.

    In the first one, you might have missed the organizational issues – they likely would have had to fill out many forms and get lots of signatures and if your analysis showed something very different it likely would be bad for their careers. (Many in management seem to think there are clearly correct stat analysis and getting that wrong is not being able to do ones job.)

    Think ethics in Nursing may be worth looking at – statisticians often are not in _authority_ and bad things do happen to them when they question authority and being right does not help.
    (internet problems, no spell check)

  15. Two ethical issues along along the lines of Stephen McKay’s and C.C.Fuss’ comments above:

    1-The responsibility of individual researchers in applied fields to make their findings (and in some cases, their actual data) available and accessible to practitioners, policy makers, etc. It seems there is often little incentive for researchers to do this, and even if they could benefit in some way (financially, professionally), there seems to be a lack of venues for doing so.

    2-The challenge and irony of getting meta-analysts and research synthesists to (a) share their data and, for those studying synthetic research practices, (b) to share those practices in full detail in print or upon request. We might expect greater transparency from this population.

  16. Ethical case study:

    You have been hired as a consultant for a scientific research project. Data have already been collected, and the researchers have run an analysis they believe proves their point, after which point they contacted you to validate their findings. You suggest certain aspects of the data have not but should be taken into account, but once you do the analysis you think is right, the results do not come out their way. At which point the researchers thank you for your time.

    What should you do?
    a) If you don’t know but suspect they will ignore your results and continue to use and publish their original (according to you, faulty) analysis;
    b) If you know for sure that they have done a).

Comments are closed.