A (not quite) grand unified theory of plagiarism, as applied to the Wegman case

Posted on May 24, 2011 1:31 PM by Andrew

A common reason for plagiarism is laziness: you want credit for doing something but you don’t really feel like doing it–maybe you’d rather go fishing, or bowling, or blogging, or whatever, so you just steal it, or you hire someone to steal it for you.

Interestingly enough, we see that in many defenses of plagiarism allegations. A common response is: I was sloppy in dealing with my notes, or I let my research assistant (who, incidentally, wasn’t credited in the final version) copy things for me and the research assistant got sloppy. The common theme: The person wanted the credit without doing the work.

As I wrote last year, I like to think that directness and openness is a virtue in scientific writing. For example, clearly citing the works we draw from, even when such citing of secondary sources might make us appear less erudite. But I can see how some scholars might feel a pressure to cover their traces.

Wegman

Which brings us to Ed Wegman, whose defense of plagiarism in that Computational Statistics and Data Analysis paper is as follows (from this report by John Mashey):

(a) In 2005, he and his colleagues needed “some boilerplate background on social networks” for a high-profile report for the U.S. Congress. But instead of getting an expert on social networks for this background, or even simply copying some paragraphs (suitably cited) from a textbook on the topic, he tasked a Ph.D. student, Denise Reeves, to prepare the boilerplate. Reeves was no expert: her knowledge of social networks came from having taken a short course on the topic. Reeves writes the boilerplate “within a few days” and Wegman writes “of course, I took that to be her original work.”

(b) Wegman gave this boilerplate to a second student, Walid Sharabati, who included it in his Ph.D. dissertation “with only minor amendments.” (I think he’s saying Sharabati copied it.)

(c) Sharabati was a coauthor of the Computational Statistics and Data Analysis article. He took the material he’d copied from Reeves’s report and stuck it in to the CSDA article.

Now let’s apply our theme of the day, laziness:

(a) Wegman felt that the issue of collaborative networks was important to this congressional report. But rather than try to really figure things out, he asked a student for a bolierplate. Lazy.

(b) Wegman was an author of a congressional report with this boilerplate, he also was supposed to read Sharabati’s Ph.D. dissertation. He didn’t read it carefully enough to realize that an entire portion had been copied. Lazy.

(c) Wegman and Said were authors on the congressional report and also authors on the CSDA paper. They didn’t read the report and the paper carefully enough to realize that an entire portion had been copied. Lazy.

And then there’s the whole repeat-offender thing. (And another case noted here.)

Doing it right is easy

Doing the right thing is easy, easy, easy, easy. All you have to write is something like, “Scholar X wrote a clear summary of topic Y. We paraphrase Scholar X’s summary as follows…”

The only bad thing about this is . . . maybe people who read this will realize you’re not much of an expert, and maybe they’ll ask Scholar X to write that expert report instead. But that’s the honest thing to do. That, or become an expert yourself.

Let me say it again: There’s not much mystery to plagiarism. If you take the work of person X and claim it as yours, you get credit for that work. A common defense of plagiarists is that the work being copied without attribution is not so important. But, if so, how much would it hurt to write, “Scholar X wrote a clear summary of topic Y. We paraphrase Scholar X’s summary as follows…”? The answer is: it could hurt a lot, because it could quickly become obvious that you didn’t do the work, and then the question arises, why should you be considered the expert? Why indeed?

One more time: Wegman has implied that copying “boilerplate” isn’t really plagiarism or, if it is, it’s no big deal, not affecting his scientific conclusions. But that’s not really correct. The work in question is not a theorem that’s true or false, it’s a bunch of statements (an “opinion piece,” in the words of CMU prof Kathleen Carley), and in that case the expertise of the authors is an important contribution.

The motivation for copying without attribution is clear: it allows Said, Wegman et al. to claim expertise without doing the work required to actually be experts.

Summary

There are two ways to claim expertise:

1. Be an actual expert and do the work, or

2. Plagiarize. Or hire non-experts who, being in the exact same position as you, will have no choice but to plagiarize if they want to be viewed as experts.

Strategy 1 takes a lot more work than strategy 2. On the other hand, if you do strategy 2 and you get caught, you’re going to look bad. Especially if you’re a repeat offender.

P.S. In case you’re curious, here’s a bit from the paper in question:

Centrality is one of the oldest concepts in network analysis. Most social networks contain people or organizations that are central. Because of their position, they have better access to information, and better opportunity to spread information. This is known as the ego-centered-approach to centrality. The network is centralized from socio-centered perspective.

Huh? I couldn’t really follow this so I decided to use Google to translate it from English to French to Dutch to Spanish to Slovenian to Finnish to Japanese back to English. Here’s what I ended up with:

Concentration is one of the oldest concepts in network analysis. Most social networks are organized individuals and play a central role. Because of its location makes it easier to access information and better opportunities for the dissemination of information. This is called the ego approach is important. Are managed centrally from the perspective of social networks.

This is a little better than the original but not much.

As to the actual content of the paper, I agree with those who have described it as self-refuting, in that it concludes with concerns that “peer review will be compromised,” yet the article was actually accepted without peer review by an editor who was friends with one of the authors. As noted earlier, I do not feel that a direct acceptance by the editor is necessarily bad practice, but in this case the editor clearly made a mistake by not sending to expert referees. I doubt the reviewers would’ve caught the plagiarism but I expect they would’ve caught the underlying lack of expertise that motivated the copying.

It’s not that the plagiarized work made the paper wrong; it’s that plagiarism is an indication that the authors don’t really know what they’re doing.

P.P.S. Mashey points me to this blog by Steve McIntyre who says that Said and Wegman were right in their social network analysis.

I have a few thoughts on this:

1. No one denies that the Said, Wegman, et al. article in CSDA had plagiarized material, but it’s not clear that Wegman himself knew about or approved the plagiarism. It may be that the plagiarism arose from a miscommunication with one student (Reeves, who copied material into the “boilerplate” report Wegman asked her to do, while Wegman thought she’d communicated original material), along with another student (Sharabati) who flat-out included plagiarized material into his Ph.D. dissertation and then into the CSDA paper. Given that Wegman has a history of plagiarism (see the story of the color vision article), it doesn’t look good that there was plagiarized material in another article on which he was a coauthor, and it also doesn’t look good to me that, when Wegman was told about the plagiarism, he acted defensively rather than with outrage. But we can’t be sure what this means.

2. As noted by many, the Said, Wegman, et al. article received very minimal peer review (ironically given its discussion of peer networks in reviewing).

3. As noted in my blog above, one function of the plagiarism is to give the authors an unwarranted air of expertise.

4. Plagiarism aside, the work of Said, Wegman, et al. on social networks can be evaluated on its merits. For the purpose of evaluation, let’s think of it not as a peer-reviewed article in a scientific journal but rather as a blog post. As a blog post, the Said et al. manuscript is pretty impressive: they offer opinions and also some data analysis!

However, I don’t see the work in that paper as persuasive of anything. The major conclusions are that there are different styles of research collaboration; the methodological flaws are that the entire data analysis is based on four snippets of the collaboration network. There’s no evidence or even argument that you can generalize from these four graphs to the general population, nor is there any evidence or justification of their normative recommendations. The trouble is that the authors didn’t seem to know what they are doing; one piece of evidence of this is that they plagiarized part of the their paper. It’s not that the plagiarism automatically discredits the social network analysis; rather, the plagiarism is consistent with the general hypothesis that Said, Wegman, et al. didn’t know what they were doing. It’s fine for them to present graphs of four collaboration networks, but I don’t see these graphs as really adding any support to the authors’ normative claims.

42 thoughts on “A (not quite) grand unified theory of plagiarism, as applied to the Wegman case”

Wonks Anonymous on May 24, 2011 10:26 AM at 10:26 am said:

I agree with those who have described it as self-refuting, in that it concludes with concerns that "peer review will be compromised," yet the article was actually accepted without peer review by an editor who was friends with one of the authors.
Wouldn't that actually make it self-confirming? Or self-descriptive?
lemmy caution on May 24, 2011 10:36 AM at 10:36 am said:

Wikipedia has a lot of plagiarism. It is also super useful. The origins of most facts doesn't really matter compared to the truth of the facts.
Alex J. on May 24, 2011 10:44 AM at 10:44 am said:

I think you mean "Strategy 1 takes a lot more work than strategy 2."
ZF on May 24, 2011 10:58 AM at 10:58 am said:

I agree with the interpretation here, however there is one aspect that is a bit underplayed – in the whole discussion of this event: PhD student.

As I am one, I am frankly shocked a bit that on the level that is very much different than an undergrad level and already reflects a type of decision made by the individual (apply and enroll), we encounter this nonchalant "stealing". I would assume if a PhD student is involved in this – no matter with whom and where here name is on a paper on a report – is basically done. From a rational choice perspective it strikes me again, and it would make sense only if the project coordinator is the only person that could help you get through or than get a job.

All this is not to say that the other aspects involved are not serious or alarming.
Jonathan Gilligan on May 24, 2011 11:16 AM at 11:16 am said:

In the summary, you transpose numbers: the sentence after the enumerated strategies should read, "Strategy 1 takes a lot more work than strategy 2" or "Strategy 2 takes a lot less work than stragety 1."
james on May 24, 2011 1:28 PM at 1:28 pm said:

'But, if so, how much would it hurt to write, "Scholar X wrote a clear summary of topic Y. We paraphrase Scholar X's summary as follows…"?'

I think the problem is that presenting an extended quotation is frowned upon in academic writing when you're just presenting and not engaging with any of the ideas. I don't think I could get away with "Gelman (2011) performed a comprehensive literature review which I quote in its entirety below:…" and that being my entire review section. So you paraphrase, which is basically taking a very clear exposition and have to having to do a lot of thankless work degrading it to a state where it's sufficiently your own work. I'm not defending plagiarism, but I don't think either of those are really good alternatives.

What's your view on statistical boilerplate? The brief description of the form of your data, the test you're using, and key outputs. Like this, which I grabbed from wikipedia.

"Outcomes of the two treatments were compared using the Wilcoxon–Mann–Whitney two-sample rank-sum test. The treatment effect (difference between treatments) was quantified using the Hodges–Lehmann (HL) estimator, which is consistent with the Wilcoxon test (ref. 5 below). This estimator (HLΔ) is the median of all possible differences in outcomes between a subject in group B and a subject in group A. A non-parametric 0.95 confidence interval for HLΔ accompanies these estimates as does ρ, an estimate of the probability that a randomly chosen subject from population B has a higher weight than a randomly chosen subject from population A. The median [quartiles] weight for subjects on treatment A and B respectively are 147 [121, 177] and 151 [130, 180] Kg. Treatment A decreased weight by HLΔ = 5 Kg. (0.95 CL [2, 9] Kg., 2P = 0.02, ρ = 0.58)."

There are thousands of these things published a week, often with a couple of paragraphs explaining the method and key formula when they won't be familiar. I'm sure these sections are routinely lifted with the relevant bits swapped out. I'm sure it also happens all the time where you need to give a concise description of your data or experiment, which is pretty much the same as the last hundred people who've done the same thing.

I just think a lot of technical writing gets so concise, functional, routine and anonymous that it's inevitable that it's reused.
Andrew Gelman on May 24, 2011 2:44 PM at 2:44 pm said:

Fixed.
A. Tsai on May 24, 2011 3:38 PM at 3:38 pm said:

I think the overall problem is that the penalties for plagiarism are severe, whereas the penalties for being careless and/or stupid ("oops, I was copying my notes into Microsoft Word and intended to cite them as quotes, but in my haste to assemble the manuscript I did not realize that I pasted entire passages as my own work") are less severe. While there may or may not be an explosion of plagiarism today, there certainly is an explosion of the 'carelessness defense' among people who are caught appropriating the text of others. I would wager that if the halls of academe switched the penalties around (ie., so that the penalties for being careless/stupid were more severe than the penalties for plagiarism), we would see an explosion of people confessing to plagiarism.
John Mashey on May 24, 2011 3:53 PM at 3:53 pm said:

"Self-refuting" in this context comes from SSWR, p.150:

'"Finally, the mentor style of co-authorship, while not entirely free of the possibility of bias, does suggest that younger co-authors are generally not editors or associate editors. And often they are not in a position to become referees, so that the possibility of bias is much reduced. Nonetheless, even here, a widely respected principal author has the possibility of smoothing the path for his or her junior collaborators, while the papers of a high reputation principal author may not be as critically reviewed as might be desirable."

This paper is "self-refuting." Among other things, Said managed to become an Associate Editor of C&DA less than 2 years after PhD. A poor paper was accepted in 6 days, despite plagiarized text from famous textbooks, weak references and conclusions unsupported by evidence.'
==
The self-refuting part was the idea that the mentor model was less likely to cause trouble … written by Wegman and 3 of his students, filled with plagiarism, poor scholarship, bad SNA and a leap to unjustified conclusions, of which one was less likely problems from their network.

Of course, as pointed in out in SSWR, pp.143-152 on this, they did not seem at all well-informed on SNA.
Andrew Gelman on May 24, 2011 5:38 PM at 5:38 pm said:

James:

Regarding your first question: If you're truly an expert in an area, it shouldn't be hard to introduce the field in your own words. Wegman etc. could easily have introduced social network research with a single paragraph–if they knew enough about the field to write that paragraph.

Regarding your second paragraph: In a statistics article, it would be enough to say you did the Wilcoxon test or whatever and give a reference, it's not necessary to summarize the test.
andrewt on May 24, 2011 10:28 PM at 10:28 pm said:

I'm not sure its fits your theory but I have a more bizarre example courtesy Wegmans' group. Last year search for work on clustering, I came across a 1997 paper by Wegman et al "Statistical software, siftware and astronomy" – a boring and competent survey but it contained a passage on database technology which didn't make sense so I googled a phrase from it: "structured data in zoned fields".
and discovered that a the ~200 word passage was actually taken almost verbatim from a 1995 GMU Phd thesis – where the passage made sense. The Phd student was not a co-author or acknowledged on Wegman et al 1997.

It didn't stop there. Google also found a 2006 PhD thesis by one of Wegman's students which contained the same passage and much more – 1000+ words from Wegman's 1997 paper. Exactly how a supervisor can not notice a student has plagiarized pages of material from one of his own papers I don't know.

And google had more – a 2009 conference paper by a Howard University professor incorporating 10+ pages from the 2006 GMU PhD thesis almost verbatim right-down to section headings, and including a 3rd generation copy of the passage from the 1995 GMU thesis.
Nick Cox on May 24, 2011 10:29 PM at 10:29 pm said:

The idea that absolutely everything not your own should be referenced is absurd and would lead to unreadable papers — indeed some cross my desk already.

Current practice shows that people mostly know this, except when they preach about always giving references. In almost all statistical contexts except history of statistics, authors do not cite Legendre and Gauss for using least squares, nor would referees insist on their adding the references. If you did, it might be interpreted as quirky rather than scholarly.

How many people feel obliged to cite Robert Recorde when they use an equals sign? Or Newton and Leibniz every time they use elementary calculus?

This is not to defend Wegman and/or co-authors and/or non-co-authors, some or all of whom appear clearly to have crossed the line — but to emphasise that there is a line. There is always common property at any level of discussion.

At a pedagogic level, the injunction to reword has its main function an attempt to get students to think about what they are copying, rather than to stop copying, because mechanical "copy and paste" imparts little. Indeed, in some sense we want students to learn and _reproduce_ standard material in many disciplines.

(As a side-detail, it is bizarre that "cut-and-paste" is a common phrase in discussions of plagiarism, when discussants really mean "copy-and-paste". For people who presumably use computers daily not to know the difference is puzzling.)
Eli Rabett on May 25, 2011 2:43 AM at 2:43 am said:

The common solution is : See Ref. X for details
Deep Climate on May 25, 2011 8:30 AM at 8:30 am said:

Andrewt's discovery of the earliest known apparent plagiarism involving the Wegman group is summarized here, with all the necessary links:

http://deepclimate.org/2010/12/02/wegman-et-al-mi…

However, the subsequent Howard Univeristy prof's 2009 appropriation of the 2006 GMU PhD is new to me.

It's also time for statisticians to take a long overdue detailed look at Wegman's appraisal of Michael Mann's "short-centred" PCA and the McIntyre-McKitrick critique thereof.

Wegman simply reran the M&M "monte carlo" test code, and reproduced the exact same figures as in the GRL 2005 paper, along with a fourth M&M figure that was not published in the original paper. In that fourth figure, the displayed sample "hockey sticks" from so-called "red noise" null proxies were not even recomputed, but simply a random selection from the 1% of most upward turning leading principal components previously archived by M&M (presumably unbeknownst to the Wegman team, who don't seem to have understood what M&M's code actually did).

Wegman claimed that these "hockey stick" leading principal components were generated by applying "short centred" PCA to null proxy sets generated with an AR1(.2) noise model, whereas in fact they were based on the full autocorrelation structure of the original proxy set (ARFIMA model).

There's lots more here:

http://deepclimate.org/2010/10/25/the-wegman-repo…

… and especially here:

http://deepclimate.org/2010/11/16/replication-and…
Eli Rabett on May 25, 2011 8:51 AM at 8:51 am said:

Nick, cut and paste comes from the offset printing in the days when you would literally cut out some text or graphics from a piece of paper and paste it onto an offset sheet, photograph the sheet and create the printing plate from the photograph. It literally was cut and paste. I had relatives in the business of creating advertising material. Copying for plagiarism was even then referred to as cut and paste. It is not bizarre, it is, at worst and anachronism.
lemmy caution on May 25, 2011 8:51 AM at 8:51 am said:

"(As a side-detail, it is bizarre that "cut-and-paste" is a common phrase in discussions of plagiarism, when discussants really mean "copy-and-paste". For people who presumably use computers daily not to know the difference is puzzling.) "

This is a good point. Maybe it is done to emphasize the damage to the original author. Cutting sounds worse than copying.
John Mashey on May 25, 2011 10:13 AM at 10:13 am said:

Nature weighs in.
John Mashey on May 25, 2011 12:30 PM at 12:30 pm said:

Andrewt:
"Exactly how a supervisor can not notice a student has plagiarized pages of material from one of his own papers I don't know."

An alternative hypothesis is possible, that he gave the text to al-Shammeri as "reading material."

As per Strange Tales and Emails, p.7 From Wegman's email to Elsevier:

"“Walid in the meantime was working on his PhD dissertation in the area of social networks. (Denise also was working on her dissertation, but had moved to work on support vector machines❺) and thinking that the page and ½ Denise had given me was original work❻ that had not been formally published, I gave it as reading material to Walid as background material along with a number of other references. Walid included it as background material in his dissertation with only minor amendments.‖❼"
Millsy on May 25, 2011 1:15 PM at 1:15 pm said:

Some of this brings me to a question I've been wondering about for a while now. Writing about sports and economics, I often want to give context for a certain result that I find. There really is no better place to find detailed information on a breadth of teams and players (beyond simple statistics) outside of Wikipedia. So, in some work, I will briefly mention some context for Player X or Team Y and cite Wikipedia (usually, I do my best to cross-reference these things).

However, in a recent paper I received back from a journal, I was scolded for citing Wikipedia (it was an interesting fact about Phil Rizzuto). For resubmission (here or elsewhere) I am left with 3 choices if I cannot find the interesting tidbit elsewhere:

1) Ignore the critique and cite Wikipedia anyway

2) Leave it in there and don't cite (are baseball facts common knowledge…for example, do I need to cite that Roger Maris hit 61 HR to break Babe Ruth's record?)

3) Just leave it out of the paper completely (though, I find it both interesting and relevant to the discussion in the paper).

As a grad student working in an age where digital resources are growing quickly, I'd like to take advantage of them. But if this sort of citation will hurt publication chances, it seems like a no-win situation. Yes, I even chuckle a bit by citing Wikipedia in a paper. But the truth is that it can be a great resource for these sorts of things. I certainly would not cite any mathematical equations or economic theory from the website.

Maybe I should cite Britannica instead? They're a bit expensive, though I could just start off with "V" like Joey from Friends and fill out the alphabet later.
Ben Bolker on May 25, 2011 3:34 PM at 3:34 pm said:

I think the usual recommendation is that you find the information in the primary reference cited by Wikipedia (which is always supposed to exist and be adequately cited :-) ) and quote it from there.
Nick Cox on May 25, 2011 3:56 PM at 3:56 pm said:

I am always interested in the history of expressions, but I don't think that is primary here. The distinction between copy and paste which leaves the original untouched and cut and paste which leaves no trace is something that should be evident to any computer user, and it is apposite to almost all modern plagiarism. I don't think most people who talk about cut and paste are alluding to the history of printing. They are just carelessly echoing someone else. Copy and paste must be a few decades old by now as an expression, but I don't have a date.
Millsy on May 25, 2011 4:25 PM at 4:25 pm said:

Ben,

I agree, but unfortunately it doesn't always seem to be the case. Perhaps then one might say it's not reliable–a reasonable criticism–but where does "The only MVP to lead in sacrifice bunts" lie on the line of citing as source that says "Maris broke Babe Ruth's HR record"?

Is one requiring a citation and one not? Best to cite a primary source for both?

Luckily with many of these sports ones I can confirm these facts on something like Baseball Reference. But if somewhere else pointed me that way, is it appropriate to just say I looked it up myself and cite BBR as the source? I think these are legitimate questions without an easy answer (short of whoever typed it up there citing their primary source for every statement).
Eli Rabett on May 25, 2011 7:28 PM at 7:28 pm said:

The editorial in Nature about this will please Nick Cox, Cut and Paste is the title
Eli Rabett on May 25, 2011 7:28 PM at 7:28 pm said:

Sorry, Copy and Paste. It's late
Marcel Kincaid on May 26, 2011 2:18 AM at 2:18 am said:

@Millsy

Wikipedia is written by people of indeterminate expertise. In fact, by the "no original research" Wikipedia policy, their expertise is irrelevant. If it isn't cited, it doesn't belong in Wikipedia (by Wikipedia policy and by logic) … citing such material is completely unreliable.

"But if somewhere else pointed me that way, is it appropriate to just say I looked it up myself and cite BBR as the source?" — if you did look it up.
Marcel Kincaid on May 26, 2011 2:30 AM at 2:30 am said:

@Nick Cox

it is bizarre that "cut-and-paste" is a common phrase in discussions of plagiarism, when discussants really mean "copy-and-paste". For people who presumably use computers daily not to know the difference is puzzling

When one finds something to be "bizarre" and "puzzling", as well as utterly implausible (as is the claim that people using computers daily don't know the difference between cutting and copying), that may be an indication that one's reasoning is deeply flawed.

I don't think that is primary here

Nonetheless, it is; see http://en.wikipedia.org/wiki/Cut,_copy,_and_paste … "Although the mechanism was already in widespread use in early line and character editors, Lawrence G. Tesler (Larry Tesler) popularized "cut and paste" in the context of computer-based text-editing while working at Xerox Corporation Palo Alto Research Center (PARC) in 1974–1975."

(You can investigate the citation yourself because, hey, I'm lazy … but not so lazy as to depend on nothing more than what I do or don't think).
Millsy on May 26, 2011 5:02 AM at 5:02 am said:

Marcel,

Fair enough. On the latter portion of your response, I appreciate your feeling toward that, as that has been my strategy thus far. Like I said–as a grad student–it is always good to hear how other people look at these things.
Nick Cox on May 26, 2011 5:59 AM at 5:59 am said:

@Marcel: I don't know why this point calls for such histrionics. I hope your blood pressure is not in danger.

My contention is factual. Most plagiarism leaves the original as it was and is thus copy and paste. Indeed, it is the ability to find the original and compare that allows plagiarism to be documented. Are you disagreeing?

As has been pointed out, the recent Nature editorial uses that terminology too, so perhaps it is two of us against you and the rest of the world.

Anyone who says "cut and paste" is unlikely to be misunderstood in practice, but I prefer to use correct expressions. Anyone is free to call this pedantic, but the word is not a pejorative with me.
Stephen McIntyre on May 26, 2011 1:18 PM at 1:18 pm said:

Mashey's characterization of my blog post was inaccurate. Prior to a couple of days ago, I was unaware of the existence of Said et al 2008. I didn't say that Said et al 2008 proved their point. Their "hypothesis" was that members of Mann's "clique" had been reviewing one another's papers. In questioning in 2006, Wegman said clearly that, given the anonymity of peer review, this was only a hypothesis.

I said that the Climategate documents showed what Said et al 2008 hadn't been able to demonstrate – that members of Mann's "clique" had been, in fact, reviewing one another's papers. Examples of such reviews were included in the Climategate documents.

In my blog article, like many others, I observed the irony of the fact that Said et al 2008 was weakly reviewed, owing, no doubt, to an association between Wegman and the editor. However, rather than "refuting" the article, it seems to me that, if anything, it supports the assertion that pal review is less diligent than independent review.
Eli Rabett on May 26, 2011 2:51 PM at 2:51 pm said:

Nick, its not histrionics, but those of us old enough to have experienced cut and paste are bemused by the migration or not of the term to copy and paste. Language is interesting stuff.
Gneiss on May 26, 2011 3:14 PM at 3:14 pm said:

As McIntyre very well knows, much of the plagiarized text in Said 2008 was copied from the Wegman report itself, which started the chain of plagiarism. In addition to copying from Wikipedia and other sources including Bradly, sometimes comically or intentionally changing the meaning, the Wegman report copied McIntyre's code and just re-ran it, down to the same random-number seeds. Then Wegman falsely presented the results as an independent replication without even understanding them well enough to accurately describe what the analyses did, e.g. guessing wrongly (and impossibly) that AR(1) r=.2 (rather than fractional ARIMA) was what McIntyre meant by his cryptic term "persistent red noise."

But coming oddly back to the failures of peer review … it was bloggers and not Wegman's ASA peers who figured out what had happened.
Nell on May 27, 2011 1:20 PM at 1:20 pm said:

@Milsy,

"But if somewhere else pointed me that way, is it appropriate to just say I looked it up myself and cite BBR as the source?" – I have seen many a grad student get in trouble for trusting that a citation accurately portrayed what the primary source really said.

It is always a bad idea to trust that others have accurately cited their material – especially on Wikipedia as there is a substantial underground movement to deliberately write ridiculous statements within wiki articles.

I would encourage anyone to be wary of citing claims "truths" anywhere but with the primary data. Always look for the primary source. One you see how often things are misinterpreted or misrepresented you'll be terrified to not do so.

Most importantly, and re your original post, if you can't find any evidence for your Phil Rizzuto fact anywhere else, that is a pretty big red flag that something is amiss. The internet is a big place, people love Baseball, and Rizutto is a well known Yankee. If it is true, it has to be out there somewhere else…but until you know it's true, you just can't build an argument on it.
John Mashey on May 27, 2011 1:51 PM at 1:51 pm said:

Actually, to bring this back from SNA to statistics, recall that DC found this and specifically, the side-by-side comparison of the Wegman Report, pp.15-17, about 2.5 pages on principal components, noise, autoregressive models, Gaussian noise, etc. Unlike SNA, which was clearly not an area of expertise, one would have expected this to be, and in fact, it has a lower percentage of clearly-identified antecedents, from Jolliffe, Wikipedia, etc. That may mean they did more original writing or that antecedents were hard to find.

I would suggest that this one is clearly in the laziness side, as opposed to the claim of unearned expertise on the SNA part. Anyone would certainly assume that Wegman and Scott, at least, would know this.
Barry on May 29, 2011 12:29 PM at 12:29 pm said:

Nick Cox: "The idea that absolutely everything not your own should be referenced is absurd and would lead to unreadable papers — indeed some cross my desk already.

Current practice shows that people mostly know this, except when they preach about always giving references. In almost all statistical contexts except history of statistics, authors do not cite Legendre and Gauss for using least squares, nor would referees insist on their adding the references. If you did, it might be interpreted as quirky rather than scholarly. "

Strawman much? The point of referencing, as I understand it, is two-fold – first, to allow readers to check the source material if they wish, and second, to make it clear which work is the authors', and which was the work of the giants on whose shoulders they stood. If you think that readers would believe that your uncited reference to linear models meant that you originated linear models, then you should indeed cite Gauss at al. And for equal signs.

I don't see anybody here making such a suggestion.
Nick Cox on May 29, 2011 10:37 PM at 10:37 pm said:

@Barry: I think we agree. I must assume that my phrasing was too oblique or unclear as far as you were concerned.

I suggest that this thread and wider debate on these matters need at some point to be clear and precise on quite what plagiarism is — and naturally that has implications for how to avoid it. I routinely see the advice to give references for everything that is not your own idea and to use quotation marks for what are not your own words.

That's nearer right than wrong, but I merely made the second-order point that in practice there is common intellectual property familiar to people in a field for which no reference is generally considered necessary. You appear to be agreeing, and I welcome that. The examples were chosen to be absurd.

This doesn't affect the major point that poor and indeed unacceptable practices appear to be endemic in much the work of the Wegman group, including plagiarism in anybody's sense. But I am less exercised by recycling text within the works of a small group than by copying chunks verbatim from other sources, and I imagine that distinction to be widely shared.
John Mashey on May 30, 2011 12:02 PM at 12:02 pm said:

@Nick:

1) Like you, I'm not overly worried about recycling material in a small group, although it would be nice if the group had some idea of the original provenance. Likewise, particularly for introductory material in one's own area of expertise, rewriting it every time seems a total waste of time.

I recommend to everyone the 5-page discussion on self-plagiarism by Pam Samuelson, who is a very serious player on the intersections of law and technology, such as the Google Books settlement. Pam is a *very* smart and thoughtful person.

2) As for recycling, what are academics' opinions on this one?

a) Student X does PhD dissertation.

b) A year or so later, one of the student's co-supervisors (Y, Z) extracts parts of the a section of the dissertation, changes all the "I"s to "we"s, makes minor edits, uses the same graphs, and submits an article to a conference. It gets published in a journal as part of the conference proceedings. Y is the Corresponding Author, oddly, no affiliations are given for X and Z, and there is no reference to X's PhD.

c) The article is published with authorship:
Y, Z, X.

[Needless to say, turning chunks of PhD into articles generally seems a fine thing, but putting 2 names ahead of the student seems a little strange. Maybe this was OK with X.]
jrkrideau on June 1, 2011 8:55 AM at 8:55 am said:

I'm not an academic though I seem to have spent a lot of time in academia.

It seems to me that Step 3 is completely unacceptable though possibly one could do it by clearly stating that this was all taken from X dissertation. It still strikes me as very dubious.

Re-using a good bit of the dissertation with X clearly as first author and noting the paper was based on X's dissertation strikes me as fine.

My suspicion would be that it would not and should not get through peer review with an Y, Z, X authorship.

One kind of wonders if X even knew that the paper was being prepared.
Irony on June 1, 2011 11:10 AM at 11:10 am said:

Not using spell check + not editing your work, and incorrectly spelling "boilerplate" over and over again. Lazy
John Mashey on June 4, 2011 12:07 PM at 12:07 pm said:

re: My May 30 comment
People asked, but this wasn't a hypothetical.
The details and side-by-sides are given in SSWR W.5.7, which DC kindly hosted for me.

For context, that's an expansion of that Appendix from SSWR.

One erratum: http://www.faculty.ucr.edu/_hanneman/nettext got broken, the _ should be a ~. it's right later in the precise page URL.
Y = Said
Z = Wegman
X = Sharabati

Relevantly, the same 3 Federal agencies are Ack'd as in the paper in retraction.

Eli Kintisch has published the actual forthcoming retraction notice a few days ago.
Richard Hooker on June 10, 2011 11:17 AM at 11:17 am said:

I'm not sure I agree about the lazy thesis. In my academic days (Stanford and then WSU), faculty plagiarism was endemic (especially at Stanford). The principal cause is . . . institutional lack of regulation, sanctions, and enforcement. I'm reminded of the Tiger Woods (and Arnold Schwarzenegger and insert-name-here) sex scandal when people asked, "Why did he do that?" where "that" referred to having sex with several dozen supermodels. I would laugh and say, "Because he could."

In my experience (from a couple decades ago), faculty plagiarism involved no or few penalties for anyone except the person who reported it (usually: end of career — look at what happened to Norm Finkelstein).

Now, as a matter of sanctions and enforcement, proving plagiarism, as every undergraduate professor can tell you, isn't an enormously difficult undertaking.

The problem is easily solved: institutions need to define unambiguous and rigid sanctions for faculty plagiarism, provide a safe mechanism for reporting faculty plagiarism, fair investigation weighted to the facts rather than the faculty, and then consistently impose unyielding sanctions on the guilty. Simply put, losing your job (guaranteed) is a strong incentive not to take the lazy route and plagiarize, no?

Unfortunately, I see nothing in your discussions of plagiarism that even hint at this common-sense (and fair) approach. Which is why the Wegman's of the world keep going back to the trough to feed.
Andrew Gelman on June 10, 2011 2:36 PM at 2:36 pm said:

Richard:

Good point. I have no idea why Edward Wegman, Drois Kearns Goodwin, etc. still have their university affiliations. I guess there's no easy way for institutions to get rid of these people.
TCO on June 11, 2011 11:56 AM at 11:56 am said:

McIntyre has had issues pointed out with the Wegman report since 2006 (like the figure claiming AR1 (0.2)) and has just stayed mum on them.

I think he plays a game of actually knowing things his side is saying are wrong, but refusing to address it. This means he stays chums with his side and also the false claims continue to get traction. He won't actually endorse bad stuff (is pretty cagey there). He basically wants to get the benefit of it, without the downfall when shown wrong.

The fellow admires Bill Clinton, another equivocator, word parser, and…liar.

Comments are closed.