## W’man < W'pedia, again

Blogger Deep Climate looks at another paper by the 2002 recipient of the American Statistical Association’s Founders award. This time it’s not funny, it’s just sad.

Here’s Wikipedia on simulated annealing:

By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random “nearby” solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly “downhill” as T goes to zero. The allowance for “uphill” moves saves the method from becoming stuck at local minima—which are the bane of greedier methods.

And here’s Wegman:

During each step of the algorithm, the variable that will eventually represent the minimum is replaced by a random solution that is chosen according to a temperature parameter, T. As the temperature of the system decreases, the probability of higher temperature values replacing the minimum decreases, but it is always non-zero. The decrease in probability ensures a gradual decrease in the value of the minimum. However, the non-zero stipulation allows for a higher value to replace the minimum. Though this may sound like a flaw in the algorithm, it makes simulated annealing very useful because it allows for global minimums to be found rather than local ones. If during the course of the implementation of the algorithm a certain location (local minimum) has a lower temperature than its neighbors yet much higher than the overall lowest temperature (global minimum), this non-zero probability stipulation will allow for the value of the minimum to back track in a sense and become unstuck from local minima.

Now I know why Wegman likes to plagiarize! The above passage does not appear to be plagiarized; instead it looks like Wegman read some material and rephrased it in his own words, adding error in the process. It reads like a junior high school book report: “Though this may sound like a flaw in the algorithm, it makes simulated annealing very useful . . .” And how about this: “this non-zero probability stipulation will allow for the value of the minimum to back track in a sense and become unstuck from local minima.” Huh?

It’s almost as if English is not Wegman’s first language, or as if he doesn’t know what he’s talking about and trying very very carefully not to make any mistakes. Or as if he didn’t write it at all . . . but that can’t be! His name is on the article (he’s the second author, behind the less-celebrated Yasmin Said), so I can only assume he has fully read the article and takes responsibility for its content.

The lesson here is: if you’re going to publish in an expensive “peer-reviewed” journal on something you know nothing about, you should do the following:

1. Plagiarize. If you try to write it in your own words, you may come across looking like a fool!

2. If you do plagiarize, don’t paste into Word. Use font-preserving software so that “2^n” doesn’t become “2n”.

3. Hope that nobody actually reads your article—if they do, they might notice the mistakes and the plagiarism. And that could make you look bad.

4. If all else fails, don’t apologize! A mixture of blaming others and stonewalling should suffice. Remember: you’re the victim here. Why are you being singled out all of a sudden. After all, everybody does it, right??

P.S. I recognize that intonation is difficult to perceive in typed speech. The four suggestions above are meant ironically. In reality, I think people should not plagiarize, should not pollute the scientific literature by writing about things they know nothing about, and should admit and apologize for their offenses.

P.P.S. I think this is much worse than the Bruno Frey case. First, I think it’s worse to copy others’ work without attribution than to copy one’s own work. (At least there’s no misattribution of credit.) Second, Frey’s papers were not bad. Arguably they were not as great as the journal reviewers thought—Frey seems to have been able to go pretty far based on good writing and novelty value of his topics and ideas—but they were real (if minor) contributions to the literature. In contrast, Wegman’s papers discussed here are not contributions are all. Where they are not actually wrong, they are empty.

Breaking the rules is bad, but breaking the rules and not coming up with anything helpful to anybody, that’s much worse.

P.S. John Mashey points me to this:

Your review will be published alongside other world-class contributions from leading researchers in the field. All WIREs article topics and authors are selected by an internationally renowned Editorial Board, and all content is rigorously peer reviewed by experts.

Despite the above paragraph appearing on the website of a reputable publisher, it appears to be false. I cannot imagine that the statement, “this non-zero probability stipulation will allow for the value of the minimum to back track in a sense and become unstuck from local minima” was rigorously peer reviewed by experts.

I’d loooove to see the record of who were the “experts” who reviewed for Wegman’s various contributions to that journal.

I think you’re getting carried away here…

The Wegman paragraph is OK until the last sentence, where it seems to get confused about “temperature” versus “energy” (well, actually, there’s one earlier place where this happens too). The criticism by Deep Climate that Wegman fails to mention that the proposed changes are to “nearby” points is itself wrong – the simulated annealing framework is not restricted to proposals being nearby, though they often are. Is that Deep Climate criticism based on the assumption that anything in Wikipedia must be right? In fact, the Wikipedia paragraph is also wrong, even apart from implying that proposals are always nearby. The proposals for simulated annealing do not depend on the energy difference – only whether or not they are accepted depends on this.

So Wegman’s article doesn’t present simulated annealing very well. But I think it’s hopeless to attempt to expunge all such deficient explanations from the literature, particulary in contexts like this, where it’s not like someone is going to try to implement simulated annealing based on this paragraph. Better to concentrate on actual misconduct like plagiarism, where there does indeed seem to be a lot to criticise with respect to Wegman.

• Andrew says:

See Phil’s comment below. Regarding your larger point, I’m not trying to expunge anything. I’m just saying that, as a matter of scientific ethics, it’s inappropriate for someone to publish an article on something he doesn’t understand. It would be far more appropriate for the journal to publish a link to the Wikipedia article (which can be improved, unlike Wegman’s article, which will just sit there in its errors forever).

2. Phil says:

I didn’t read Deep Climate’s take on it, so maybe they make this point already, but: It’s not just the weird English that is objectionable, Wegman’s explanation is flat wrong. It is not the temperature that is different at the local minimum than at the global minimum! The “temperature” is a parameter in the fitting process, it is not a feature of the function being minimized. It’s clear that Wegman has no idea what he is talking about. Probably he tried to learn it from the Wikipedia page (which, though correct, is not exactly crystal clear unless you already know what they mean when they talk about the temperature). This is indeed sad.

3. John Mashey says:

2) Wiley WIREs For authors says:
“Your review will be published alongside other world-class contributions from leading researchers in the field.

All WIREs article topics and authors are selected by an internationally renowned Editorial Board, and all content is rigorously peer reviewed by experts.
Your article will have the highest possible visibility and usage.

Your review will attract full scientific and professional credit.

The WIREs are serial publications that qualify for full Abstracting and Indexing and an Impact Factor/ISI Ranking.”

“The WIREs adhere strongly to the guidelines of the Committee on Publication Ethics (COPE). All instances of publishing misconduct, including, but not limited to, plagiarism, data fabrication, image/data manipulation to falsify/enhance results, etc. will result in rejection/retraction of the manuscript in question.”

SO: WIRES IS RIGOROUSLY PEER REVIEWED. Tthe article is written by 2 of the 3 Editors-in-chief.
Was there any peer-review, and who managed it?

Wegman’s Feb 2010 CV (p.23) listed this as #197, in PUBLICATIONS – BOOKS, SPECIAL ISSUES OF JOURNALS, AND SOFTWARE, not under INVITED PAPERS. (There was no REFEREED PAPERS category.)

3) Even without the plagiarism, the paper was obvious junk, to anyone with the slightest background.
Whether Wikipedia is right or wrong, starting with it and making it worse cannot ever be useful, and for plagiarized-junk articles to be written for a journal claimed to be rigorously peer-reviewed probably say that at some point Wiley will ne3ed to get a whole new editorial team to review *everything*, especially before it goes into the Encyclopedia this was supposed to create. many articles seem fine, by obvious experts, but given these examples, would everybody trust the review process in general?

The best analog might be from Billy Madison(1995):
“Mr. Madison, what you’ve just said … is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it.”

a) My modest formal O.R. background was long ago, and the article was obviously a weird mishmash on first read, even it me.

b) I showed it to someone whose PhD was in optimization decades ago, whose first reaction was “some poor rehash of Hillier&Lieberman?”

c) I sent it to a real O.R. expert at a leading university, who found so many problems in the first few pages that he stopped there.

4) WHEN WAS WILEY INFORMED?

Wiley was informed of the Wegman&Said(2011) problem in MARCH, 6.5 months ago. DC had already documented that in detail.

Wiley was informed of the Said false title and affiliation (Professor, Oklahoma State University) in APRIL, 5.5 MONTHS AGO.
That finally got fixed in September: Professor OSU to Professor GMU to Assistant Professor GMU.

Wiley was briefly informed in April about Said&Wegman(2009):

“2) PROBLEM: FURTHER PLAGIARISM: WIRES:CS Vol 1, Issue 1, Said and Wegman ,“Roadmap for optimization” (SW2009)
http://onlinelibrary.wiley.com/doi/10.1002/wics.16/abstract

Part of this article seemed to have come from Wikipedia, but more has been found since:
http://deepclimate.org/2011/03/26/wegman-and-said-2011-dubious-scholarship-in-full-colour/#comment-8486

I think a thorough comparison document will be prepared by an associate in next week or two, but a few hours’ efforts sufficed to find Wikipedia pages, circa mid-2009, all of which have text with striking similarities, although SW2009 occasionally has extra errors.
http://en.wikipedia.org/w/index.php?title=Karush%E2%80%93Kuhn%E2%80%93Tucker_conditions&oldid=303189545
http://en.wikipedia.org/w/index.php?title=Linear_programming&oldid=302228577
http://en.wikipedia.org/w/index.php?title=Simplex_algorithm&oldid=269565766
http://en.wikipedia.org/w/index.php?title=Karmarkar%27s_algorithm&oldid=292855439
http://en.wikipedia.org/w/index.php?title=Simulated_annealing&oldid=301539847

For example, here is a cut-and-paste with minimal trivial edits, a plagiarism style seen often involving Said:

Said and Wegman: p.9 Simulated annealing (zero citations)
“Simulated annealing is a probabilistic metaheuristic global optimization algorithm
for locating a good approximation to the global minimum of a given function in
a large search space. For many problems, simulated annealing may be more effective than exhaustive enumeration provided that the goal is to find an acceptably good solution in a fixed amount of
time, rather than the best possible solution.”

http://en.wikipedia.org/w/index.php?title=Simulated_annealing&oldid=301539847 (July 2009)
” Simulated annealing (SA) is a generic probabilistic metaheuristic for the global optimization problem
of applied mathematics, namely locating a good approximation to the global minimum of a given function in a large search space. … For certain problems, simulated annealing may be more effective than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution.”

One might ask if anyone actually reviewed this paper, as it has problems beyond plagiarism. The approach seems to take uncited Wikipedia pages, copy a few of the references found in Wikipedia, but often detached as “further reading” or equivalent.”

4. […] Andrew Gelman piles on: […]

5. Even The New Yorker appears to be “borrowing” from Wikipedia:

The New Yorker/Night Life/Rock and Pop, Sept 12, 2011, p 9, listing for Todd Rundgren, playing at BB Kings:
On 1972’s “Something/Anything?,” he wrote, played, sang, and produced every sound on three of that double album’s four sides, including the enduring pop hits “I Saw the Light” and “Hello It’s Me”.

Wikipedia entry for Todd Rundgren (version of July 28, 2011):
By 1972, the Runt persona/band identity had been abandoned, and Rundgren’s next project, the ambitious double LP Something/Anything? (1972) was credited simply to Rundgren, who wrote, played, sang, engineered, and produced everything on three of the four sides of the album. Something/Anything? featured the top 20 U.S. hits “I Saw The Light” (#16; an original song, not the Hank Williams classic), and a remake of the Nazz near-hit “Hello It’s Me”, which reached #5 in the U.S. and is Rundgren’s biggest hit.

6. Bernard J. says:

I’m a little puzzled.

You say that “[t]he Wegman paragraph [plagiarised from Wikipedia] is OK until the last sentence…”, and you then immediately inform us that “[i]n fact, the Wikipedia paragraph is also wrong…”

Are you saying that Said and Wegman are more correct that the Wikipedia entry that they copied and bastardised? If so, I’d really like to know exactly how this works.

As Andrew says, this particular Wegman paragraph is NOT plagarized. Both Wegman’s paragraph and the Wikipedia entry are wrong, in different respects. Debating which is “more correct” would be pointless.

• Andrew says:

Yeah, what’s sad is that the Wegman paragraph is so bad. Wikipedia articles can be written by anyone, but Wegman purports to be an expert.

7. John Mashey says:

WI think Wegman+Said’s plagiarisms were already well-established before this.

The new issues here are:

a) The extent to which WIREs:CS editorial process is broken.
– It has many, probably the majority articles that look OK, by plausible experts.
It remains to be seen how much peer-review was done. Experts can do good review articles.

– It certainly has at least 2 that ought to lead to retractions.

– It has quite a few by the Editors’ students/associates (~25%), and they may be fine, or they may be marginal.
I know of one where a chunk of a PhD dissertation (which had pages of plagiarism, and a startling fraction of references cites (<1/3)) somehow became an Advanced review article, albeit without citing the dissertation for which it came with minimal editing.
Did that get peer-reviewed? If so, nothing significant changed.

b) Wiley's slowness in dealing with the plagiarism, and the seeming reluctance to remove Said's OSU affiliation and Professor title. As far as I can tell, other WIREs journals are run by senior people, not a Research Asst Professor, and seem to do normal peer review. Offhand, these journals seem like good ideas, so WIREs:CS may be a fixable aberration…. I hope.
The huge range of topics supposedly covered is still a concern.

(If any reader has any experience with any of the review processes on any of the WIREs journals, I'd be interested.)

c) We all know even at the top journals, errors and misconduct get through peer review, but still one trusts the better journals to usually weed out junk. All it takes to wreck the trust in a journal is to find a few awful papers, especially in a review journal where one may be wishing to learn about an unfamiliar area. The usefulness of such is destroyed if you have to really check out every article and its authors to make sure you getting good material, as selecting good authors checking is the value-add one expects from an editor.

But still, as per USA Today:

‘”Neither Dr. Wegman nor Dr. Said has ever engaged in plagiarism,” says their attorney, Milton Johns, by e-mail. In a March 16 e-mail to the journal, Wegman blamed a student who “had basically copied and pasted” from others’ work into the 2006 congressional report, and said the text was lifted without acknowledgment and used in the journal study. “We would never knowingly publish plagiarized material” wrote Wegman, a former CSDA journal editor.’

8. You bring up an interesting point, indirectly: Should peer reviewing be non-anonymous, or become non-anonymous after some grace period? It could be important for tracing the intellectual history of accepted and rejected papers. In particular, it would provide evidence on which people could base their various arguments about social construction!

9. John Mashey says:

I doubt there is one right answer. I’d guess that the dompetence and good will of the editors/reviewers matters more than the specific process.

I know of a case where:
1) There was no Editor-in-Chief, so authors could send articles to a specific associate editor, who then handled everything.

2) An associate editor came on, and over ~7 years, accepted 14 papers from a group of authors that had published zero papers in that journal before. Some seemed OK, some seemed weak, some were …quite dubious.

3) When that Associate Editor left … pretty quickly, so did those authors.

Now, it might be useful to do something short of eliminating anonymity:

a) Publish the reviews, but without names.

b) Give a credible review committee a list of names (say 10-20), and ask for each paper how many of those names showed up.
I.e., that keeps the reviews confidential, as well as the specific reviewers, but also give insight about any game-playing.

c) In this WIREs:CS case, it might even be enough to really know if there were real reviews at all, and simply report:
0-N real reviews.

For instance, in the CS&DA case, the evidence makes it very likely that the EiC accepted the paper himself after a quick look. The lighting turnaround was an early tipoff about this.

WIREs:CS is harder to calibrate, because:
– Articles are generally by invitation.
– The only date given is the publication date, no Recvd/Revised dates.