Skip to content

Hanging Chad

It’s been awhile since I’ve linked to Laura Wattenberg’s excellent baby name blog. Here’s a fun recent item on how one man launched a generation of baby names.

Screen Shot 2015-10-26 at 9.44.14 AM

Wattenberg writes:

Some more highlights from Willson’s roster of custom-named clients:

Rad Fulton
Cal Bolder
Rand Saxon
Race Gentry
Chance Nesbitt
Dack Rambo
Van Williams
Dare Harris
Trax Colton

Today, the Willson names sound like clichés. They’re the kind of formula-hunk names skewered by The Simpsons’ “actor Troy McClure” and mimicked by countless gay porn stars. But it was Willson who defined that formula, and parents responded to its allure.

Screen Shot 2015-10-26 at 9.45.19 AMScreen Shot 2015-10-26 at 9.45.27 AM

It’s a gay world. We just live in it.

Bayesian Computing: Adventures on the Efficient Frontier

That’s the title of my forthcoming talk at the Nips workshop at 9am on 12 Dec.

Pass the popcorn


Rodney Sparapini writes:

I got this in my inbox today. I thought this might be of interest to you and your blog readers.

It’s not at all of interest to me but it might interest some of my readers. I’m posting it here because there’s something amazing about seeing this intense dispute about something I’ve never heard of.

OK, here it is:

From: ELM Exposed
Subject: The ELM Scandal: What You May Not Know about the Extreme
Learning Machines
Date: Tue, 4 Aug 2015 04:09:35 -0700

Dear Researcher,

The objective of launching this homepage
( is to present the evidences
regarding the tainted origins of the extreme learning machines (ELM). As
we would like all readers to verify the facts within a short period of
time (perhaps 10 to 20 minutes), we have uploaded a dozen of PDF files
with highlights and annotations clearly showing the following:

1. The kernel (or constrained-optimization-based) version of ELM
(ELM-Kernel, Huang 2012) is identical to kernel ridge regression (for
regression and single-output classification, Saunders ICML 1998, as well
as the LS-SVM with zero bias; for multiclass multi-output
classification, An CVPR 2007).

2. ELM-SLFN (the single-layer feedforward network version of the ELM,
Huang IJCNN 2004) is identical to the randomized neural network (RNN,
with omission of bias, Schmidt 1992) and another simultaneous work,
i.e., the random vector functional link (RVFL, with omission of direct
input-output links, Pao 1994).

3. ELM-RBF (Huang ICARCV 2004) is identical to the randomized RBF neural
network (Broomhead-Lowe 1988, with a performance-degrading
randomization of RBF radii or impact factors).

4. In all three cases above, Huang got his papers published after
excluding a large volume of very closely related literature.

5. Hence, all 3 “ELM variants” have absolutely no technical originality,
promote unethical research practices among researchers, and steal
citations from original inventors. For easy verifications on the origins
of the ELM, with annotated PDF files, please visit:

Please forward this message to your contacts so that others can also
study the materials presented at this website and take appropriate
actions, if necessary.

ELM: The Sociological Phenomenon

Since the invention of the name “extreme learning machines (ELM)” in
2004, the number of papers and citations on the ELM has been increasing
exponentially. How can this be imaginable for the ELM comprising of 3
decade-old algorithms published by authors other than the ELM inventor?
This phenomenon would not have been possible without the support and
participation of researchers on the fringes of machine learning. Some
(unknowingly and a few knowingly) love the ELM for various reasons:

• Some authors love the ELM, because it is always easy to
publish ELM papers in an ELM conference or an ELM special issue. For
example, one can simply take a decade-old paper on a variant of RVFL,
RBF or kernel ridge regression and re-publish it as a variant of the
ELM, after paying a small price of adding 10s of citations on Huang’s
“classic ELM papers”.

• A couple of editor-in-chiefs (EiCs) love the ELM and offer
multiple special issues/invited papers, because the ELM conference &
special issues will bring a flood of papers, many citations and
therefore high impact factors to their low quality journals. The EiCs
can claim to have faithfully worked within the peer-review system, i.e.
the ELM submissions are all rigorously reviewed by ELM experts.

• A few technical leaders, e.g. some IEEE society officers,
love the ELM, because it rejuvenates the community by bringing in more
activities and subscriptions.

• A couple of funding agencies love the ELM, because they
would rather fund a new sexy name, than any genuine research.

One may ask: how can something loved by so many be wrong?

A leading cause of the current Greek economic crisis was that a previous
government showered its constituents with jobs and lucrative
compensations, in order to gain their votes, thereby raising the debt to
an unsustainable level. At that time, the government behavior was
welcome by many, but led to severe consequences. Another example of
popularity leading to a massive disaster can be found in WW II as Hitler
was elected by popular votes.

The seemingly small price to pay in the case of the ELM is the
diminished publishing ethics, which, in a long run, will fill the
research literature with renamed junk, thereby making the research
community and respected names, such as IEEE, Thomson Reuters, Springer
and Elsevier, laughing stocks. Similar to that previous Greek government
and its supporting constituents, the ELM inventor and his supporters are
“borrowing” from the future of the entire research community for their
present enjoyment! It is time to wake up to your consciousness.

Our beloved peer-review system was grossly abused and failed
spectacularly in the case of the ELM. It is time for the machine
learning experts and leaders to investigate the allegations presented
hereand to take corrective actions soon.

5 Easy but Proven Steps to Fame

1. The Brink of Genius: Take a paper published about 20 years ago (so
that the original authors have either passed away, retired, or are too
well-established/generous to publicly object. Unfortunately, pioneers
like Broomhead and Pao have passed away). Introduce a very minor
variation, for example, by fixing one of the tunable parameters at zero
(who cares if this makes the old method worse, as long as you can claim
it is now different and faster). Rewrite the paper in such a way that
plagiarism software cannot detect the similarity, so that you are not in
any of the “IEEE 5 levels of plagiarism”. Give a completely new
sensational name (hint: the word “extreme” sounds extremely sexy).

2. Publication: Submit your paper(s) to a poor quality conference or
journal without citing any related previous works.

3. Salesmanship: After publishing such a paper, now it is time to sell
the stolen goods!Never blush. Don’t worry about ethics. Get your
friends/colleagues to use your “big thing”. Put up your Matlab program
for download. Organize journal special issues, conferences, etc. to
promote these unethical research practices among junior researchers who
would just trust your unethical publications without bothering to read
the original works published in the 1980s or 1990s. Of course, the
pre-requisite for a paper to be accepted in your special
issues/conferences is 10s of citations for your unethically created name
and publications. Invite big names to be associated with your
unethically created name as advisory board members, keynote speakers, or
co-authors. These people may be too busy to check the details (with a
default assumption that your research is ethical) and/or too nice to say
no. But, once “infected” with your unethically created name, they will
be obliged to defend it for you.

4. The Smoke Screen: Should others point out the original work, you
claim not to know the literature while pointing to a minor variation
that you introduced in the first place. Instead of accepting that your
work was almost the same as the literature and reverting back to the
older works, you promote your work by: (1) repeating the tiny variation;
(2) excluding the almost identical works in the list of references or
citing and describing them incorrectly; (3) excluding thorough
experimental comparisons with nearly identical works in the literature
so that worse performance of your minute variations will not be exposed;
(4) making negative statements about competing methods and positive
statements about your unethically created name without solid
experimental results using words like “may” or “analysis”; (5) comparing
with apparently different methods. You can copy the theories and proofs
derived for other methods and apply to your method (with tiny variation
from those in the old literature) claim that your method has got a lot
of theories while others do not have.

5. Fame: Declare yourself as a research leader so that junior
researchers can follow your footsteps. Enjoy your new fortune, i.e.,
high citations, invited speeches, etc. You don’t need to be on the
shoulders of giants, because you are a giant! All you have to do to get
there is to follow these easy steps!

One can call the above steps “IP” (Intelligent Plagiarism), as opposed
to stupid (verbatim) plagiarism specified by the IEEE in “5 levels”. The
machine learning community should feel embarrassed if “IP” (Intelligent
Plagiarism) was originally developed and/or grandiosely promoted by this
community, while the community is supposed to create other (more
ethical) intelligent algorithms to benefit the mankind.

In mid-July 2015, G.-B. Huang posted an email on his emailing list. This email was forwarded to for our responses. As usual, this email was
meaningless and our remarks are attached.

And also this pdf, which you can read if you’re not tired of this yet.

I just have a few comments about the above message:

1. Hitler never received much more than a third of the vote in a fair election.

2. I thought Elsevier was already a laughing stock?

3. I’d hardly call this a path to fame, given that I’d never heard of this Huang character.

4. There’s nothing wrong with putting up a Matlab program for download, right?

5. I’m kinda doubting that invited speeches will lead to fortune. Free flights, sure, but probably not much more than that.

Taleb’s Precautionary Principle: Should we be scared of GMOs?

Skyler Johnson writes:

I was wondering if you could (or had already) weigh(ed) in on Nassim Taleb’s Precautionary Principle as it applies to GMOs?

I’ve attached his working paper with Rupert Read, Raphael Douady, Joseph Norman and,Yaneer Bar-Yam. It can also be found at his site,

See also his response to a critique from a biologist.

A search for ‘Taleb’ on your site brought up reviews of his books, but I found no mention of the Precautionary Principle.

My reply: I don’t agree with everything Taleb writes but I’m generally sympathetic to his perspective.

I liked this bit from Taleb’s response linked to above:

Many of the citations you are asking for fall within the “carpenter fallacy” that we present in the text, i.e. that discussions about carpentry are not relevant to and distract from identifying the risks associated with gambling, even though the construction of a roulette wheel involves carpentry.

This is not to say that Trevor Charles is wrong here and that Nassim Taleb is right—I feel unmoored in this whole discussion—but I do like the quote.

Speaking more generally, I suppose that Taleb’s precautionary principle could fruitfully be expressed in terms of tradeoffs. Here’s the principle:

If an action or policy has a suspected risk of causing severe harm to the public domain (affecting general health or the environment globally), the action should not be taken in the absence of scientific near-certainty about its safety. Under these conditions, the burden of proof about absence of harm falls on those proposing an action, not those opposing it.

As a statistician, I tend to be skeptical about arguments based on “the burden of proof” or “scientific near-certainty,” as they have a bit of the flavor of the one-sided bet—but what is relevant here is the idea of correlated risks.

As many observers have noted, the U.S. is in many ways a hyper-individualistic society, and social policies are often evaluated in an individualistic way. But there’s a big difference between risks that are uncorrelated or only weakly correlated in the population (for example, getting killed in a car crash) and highly correlated risks (with the paradigmatic examples being asteroid impacts and global wars).

As Taleb has written, his own attitudes on extreme events derive in part from his understanding of what happened to Lebanon in the 1970s, when a longstanding apparent equilibrium was revealed as being unstable, and which gave him a general wariness about picking pennies in front of a steamroller.

This is not really an answer to what policy should be on genetically modified organisms, but I do think that it makes sense, for the reasons Taleb and his collaborator say, to consider these global risks associated with GMOs in a different way than we treat the individual-level risks associated with electric power lines and cancer, or whatever.

Death rates have been increasing for middle-aged white women, decreasing for men

Here’s the deal (data from CDC Wonder, age-standardized to a uniform distribution in the age range):


Hoo boy. Looky here, something interesting: From 1999 to 2013, the death rate for middle-aged white women steadily increased. The death rate for middle-aged white men increased through 2005, then decreased.

Since 2005, the death rate has been rising for middle-aged white women and declining for middle-aged white men. Not by a lot—we’re talking a change of 4% over a decade—but this is what we see.

It’s funny. We’re so used to the narrative that things are getting worse for men, it’s so hard to be a guy in the modern era, etc. But in his particular case it’s the middle-aged women who are doing worse (relatively speaking; of course the absolute death rates remain much higher for men than for women, that’s just how things always are).

Background: Why age adjustment is needed

As Anne Case and Angus Deaton noted in a much-talked-about recent paper, the mortality rate among middle-aged white Americans has been roughly constant in recent decades, even while it’s dropped dramatically among other other groups and other countries.

Here’s the graph of the raw data of mortality among 45-54-year-old non-Hispanic whites in the U.S.:


But that curve, which shows a steady increase since 1999, is wrong—or, should I say, misleading. As we discussed recently in this space (see here, here, and here), it can be tricky to interpret raw death rates binned across ages, especially in the U.S. What with the baby boom generation moving through, the average age in the 45-54 group crept up from 49.3 in 1999 to 49.7 in 2013.

An increase of 0.4 years might not sound like much, but mortality rate increases a lot by age—more than doubling between the ages of 45 and 54—so even a small shift in average age can cause a big shift in the observed trends.

Here’s what we get after adjusting for age:


The flat pattern after 2005 is the sum of the increasing trend for women and the down slope for men.

What’s the point?

The published curves were biased because they did not correct for the changing age distribution within the 45-54 bin. When we make the adjustment we find something different: no longer a steady increase. And when we look at men and women separately, we find something more.

This update has not yet percolated through the news media.

For example, here’s Paul Krugman in the New York Times:

There has been a lot of comment, and rightly so, over a new paper by the economists Angus Deaton (who just won a Nobel) and Anne Case, showing that mortality among middle-aged white Americans has been rising since 1999.

Ross Douthat in that same newspaper yesterday:

Starting around the turn of the millennium, the United States experienced the most alarming change in mortality rates since the AIDS epidemic. . . . concentrated among less-educated, late-middle-aged whites.

Julia Belluz writes in about “the shocking rise in mortality rates among middle-aged white Americans.”

And Angus Deaton quoted in the Times the other day:

If we want to be more precise about the age range involved, we could say that for all single years of age from 47 to 52, mortality rates are increasing.

All these reports should be corrected to make it clear that the increase stopped in 2005. Since 2005, mortality rates have increased among women in this group but not men.

The age-aggregation bias did come up in this online NYT article, but the focus there was on the comparison between 1999 and 2013, so it did not come up that the net increase stopped after 2005, and that men and women’s mortality rates have been going in opposite directions since then.

Where does age adjustment make a difference?

First, I followed Deaton’s advice and downloaded death data from the CDC Wonder site. Second, I looked not just at the range 45-54 but also at the age decades before and after. Third, I looked at non-Hispanic whites, also at Hispanic whites, also at African Americans.

Then I computed the raw and age-adjusted death rates for each decade of age for each group, to get a sense of where age adjustment matters.

I plotted death rates since 1999, and here’s what I found:


It turns out that the only place where a lack of age adjustment really changes the story is . . . non-Hispanic whites aged 45-54. Too bad about that! But good that we checked.

Of course I may well have some “gremlins” in my analyses too. Anyone who wants can and should feel free to go to the data and find out what I garbled or missed.

Bring on the data

Finally, I broke down the numbers by sex and single year of age. Here’s what happened from 1999-2015 among all three ethnic groups:




And here’s a summary:


That pattern among 45-54-year-olds? It was happening in the younger decade too.

One more time

Let me emphasize that this is all in no way a “debunking” of the Case and Deaton paper. Their main result is the comparison to other countries, and that holds up just fine. The place where everyone is confused is about the trends among middle-aged non-Hispanic white Americans.

The story being told is that there was something special going on, with an increase in mortality in the 45-54 age group. Actually what we see is an increasing mortality among women aged 52 and younger—nothing special about the 45-54 group, and nothing much consistently going on among men. Perhaps someone can inform Douthat and Krugman and they can modify their explanations accordingly. I’m sure they’ll be up to the task.

“Using prediction markets to estimate the reproducibility of scientific research”

A reporter sent me this new paper by Anna Dreber, Thomas Pfeiffer, Johan Almenberg, Siri Isaksson, Brad Wilson, Yiling Chen, Brian Nosek, and Magnus Johannesson, which begins:

Concerns about a lack of reproducibility of statistically significant results have recently been raised in many fields, and it has been argued that this lack comes at substantial economic costs. We here report the results from prediction markets set up to quantify the reproducibility of 44 studies published in prominent psychology journals and replicated in the Reproducibility Project: Psychology. The prediction markets predict the outcomes of the replications well and outperform a survey of market participants’ individual forecasts. This shows that prediction markets are a promising tool for assessing the reproducibility of published scientific results. The prediction markets also allow us to estimate probabilities for the hypotheses being true at different testing stages, which provides valuable information regarding the temporal dynamics of scientific discovery. We find that the hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%) and that a “statistically significant” finding needs to be confirmed in a well-powered replication to have a high probability of being true.

I replied: I think the idea is interesting and I have a lot of respect for the research team. But I am not so happy with the framing of these hypotheses as being “true” or “false,” and I think that statements such as “the probability of being true” generally have no real meaning. Consider, for example, one of those notorious social priming studies such as the claim that giving elderly-related words causes people to walk more slowly. Or one of those silly so-called evolutionary psychology studies such as the claim that single women were more likely to support Obama for president during certain times of the month. Yes these claims are silly and were overhyped, but are they “false”? I think it’s pretty meaningless to even ask the question. Certainly the effects in question won’t be exactly zero; more to the point, the effects will vary by person and by scenario. It makes sense to talk about average effects and variation in effects and the probability of a successful replication (if the criteria for “success” are defined clearly and ahead of time), but “the probability the hypothesis is true”? I don’t think so.

In summary I am supportive of this project. I think it’s a good idea and I’m interested in seeing it go further. I think they could do better by moving away from a true/false or even a replicate/not-replicate attitude, and instead think more continuously about uncertainty and variation. I don’t think it would be hard for them to move away from formulations such as “the probability that the research hypothesis is true” into a more sensible framing.

P.S. Robin Hanson offers thoughtful comments. I’m impressed by what Hanson has to say, partly because they are interesting remarks (no surprise given that he’s been thinking hard about this topic for many years), but more because it would be so easy for him to just view this latest project as a vindication of his ideas. But instead of just celebrating his success (as I think I’d do in this situation), he looks at all this with a critical eye. I might disagree with Robin about John Poindexter, but he (Robin) does good here.

Pathological liars I have known


There was this guy in college who just made stuff up. It was weird, then funny, then sad. He was clearly an intelligent guy and but for some reason felt the need to fabricate. One thing I remember was something about being a student of Carl Sagan at Cornell—at the same time as he was taking 11 classes a semester at MIT. But there were lots of other lies, things that were easily checkable. I never knew his background—he seemed like a nice guy but perhaps he was never really a student, or maybe he was in the U.S. illegally, I have no idea, but maybe he was already living a lie and so he felt he might as well keep going.

The other guy was a student at an institution where I taught. It turned out he was lying about all sorts of stuff and he got kicked out of the program. The whole thing baffled me, especially when, after it was all over, one of the other grad students told us that they all knew this guy was a pathological liar. Why didn’t they tell the faculty? I have no idea.

Whassup with these pathological liars? I dunno, but maybe it’s some sort of principle of least effort. For me, lying is effortful and work is easy, so I’d rather work. For these guys, I’m guessing that it’s soooo easy to lie, but buckling down and working is tough. And, I guess, once they get in the habit of lying, they just do more and more of it. I’d think that, in order to avoid detection, they’d want to minimize the number of lies they tell. But I guess that’s not how they think.

I’d distinguish pathological liars of this sort from people such as Marc Hauser or Dr. Anil Potti or Ed Wegman or Diederik Stapel or Michael Lacour, whose misrepresentations seem pretty clearly instrumental. What’s characteristic about pathological liars is that they lie about things where they’re not really gaining from the lie, or where whatever gains they might obtain from the lie are trivial compared to the losses from being found out. I’d also distinguish them from people like Hillary Clinton, who has a habit of tweaking her stories to make them a bit more dramatic. Behavior that’s acceptable for David Sedaris but which I don’t like so much in a politician. Unfortunately, I can see the instrumental value in Clinton’s exaggerations, especially given the motivation a politician has to say what she thinks her audience wants to hear. Pathological lying seems different—it’s florid exaggeration just for the hell of it.

We’ll be discussing this in next week’s Perceptions 301 class.

P.S. Just to continue this, I find instrumental liars disturbing but I find pathological liars scary. A few months ago I had some indirect dealings with someone who was on the border of these two categories, a Nixonian type who was lying in a somewhat arbitrary and unnecessary way but using these lies in an aggressive way. Someone who would just make stuff up about me and then use this as a basis for an attack: what would this guy be capable of? I do not want to engage with someone like that. People like David Brooks or even Ed Wegman I can understand: they make mistakes (or, in Wegman’s case, ethically questionable decisions) and then don’t want to back down. And I can understand people like Mark Hauser or Ron Unz who think they have a true model of the world and so don’t want to be bothered with details. I don’t follow this approach but I kinda see where they’re coming from. Or people like all those Psychological Science researchers who in, I assume, all sincerity, are using statistical methods that are the functional equivalent of the proverbial Tarot cards: sure, I’m bothered that they don’t do better but I understand that, by their lights, they’re working hard and following the rules. But the pathological liars, people like Ben Carson who will go to the trouble to make up an entire course at Yale just for the benefit of an already-implausible story, or this other guy I dealt with online, who scared me so much that I don’t even want to mention his name here: that scares me. A lot. It probably shouldn’t, and I’m probably displaying a disgracefully old-fashioned attitude toward mental illness. Given the casually negative attitudes many people have toward Tourette’s syndrome, I’m really the last person who should go around being creeped out by something as innocuous as pathological lying, maybe. So there you have it. Now I’m just tied up in knots.

P.P.S. Commenters have rightly pointed out that I may be overreacting to whatever the news media happen to want to focus on. Mark Palko reminded us of the history of the news media pouncing on the Clintons, and BrianB pointed to a report that Ben Carson’s story was based on an actual experience. In particular, if Carson did not “make up an entire course at Yale just for the benefit of an already-implausible story,” but instead he took a life experience and twisted it a bit, making it more dramatic, it’s not so different from what Hillary is so notorious for doing. In the language of my above post, Carson was acting instrumentally, not pathologically. He was writing a book so he wanted good stories so he exaggerated or made some things up to make the story better, which from a storytelling position makes sense. Just as it makes sense for Hillary Clinton to have expressed the risks she felt in traveling to war zones by saying that her plane was under fire. Or for that matter Joe Biden stealing somebody else’s biographical story because it worked well in a speech.

One reason that I may have been characterizing Carson’s stories as pathological rather than instrumental is that I was forgetting that, when he was writing a book, his goal was to sell books, he wasn’t running for president. And embellishing or even making up stories for an autobiography, that’s pretty standard practice: the goal is to give insight into the person, not to produce a documentary record. Carson’s later lies when running for president (see, for example, here) also fall in the instrumental category in that he’s denying something that looks bad.

You won’t believe these stunning transformations: How to parameterize hyperpriors in hierarchical models?


Isaac Armstrong writes:

I was working through your textbook “Data Analysis Using Regression and Multilevel/Hierarchical Models” but wanted to learn more and started working through your “Bayesian Data Analysis” text. I’ve got a few questions about your rat tumor example that I’d like to ask.

I’ve been trying to understand one of the hierarchical models revolving around rat tumors (Chapter 5). This is where there is a binomial model with p assigned a beta distribution. The Beta distribution has parameters $\alpha$ and $\beta$ which need a distribution for the full hierarchical model.

In order to create a noninformative distribution the book parametrizes the model in terms of $\frac{\alpha}{\alpha+\beta}$ and an approximation of the standard deviation $(\alpha+\beta)^{-1/2}$. (described here too I know you mentioned not favoring this approach anymore, but I’d still like to understand the modeling thinking/process that supports this if possible.

I have a few questions about this:

– Why use an approximation here for the parametrization rather than the actual standard deviation of the Beta distribution? /When to use approximations for reparametrization? Computational reasons?

– How did you arrive at this particular approximation?

– What connection, if any, does this have to a Pareto distribution? I tried parametrizing this model with a Pareto(1.5,1) distribution for $\alpha+\beta$ and a uniform distribution on $\alpha/(\alpha+\beta)$ and ended up with $p(\alpha,\beta)\propto (\alpha+\beta)^{-3/2}$ but the book’s approach seems to yield $p(\alpha,\beta)\propto (\alpha+\beta)^{-5/2}$ which disagrees with the gentleman writing into the blog in the link above.

My reply: As I’ve said, I’ve changed my views since writing that book in the early 1990s, but not all my newer perspective has been worked into the later editions of the book. In particular, I’m not so happy with noninformative priors, for two reasons:

1. We often have prior information, so let’s use it. Traditionally we pragmatic Bayesians have been hung up on the difficulty of precisely specifying our prior information—but I it seems clear to me now that specifying weak prior information is better than specifying nothing at all.

2. With flat priors and a small number of groups, we can get a broad posterior distribution for the group-level variation, which in turn can lead to under-smoothing of estimates. In some contexts this is ok (for example, when the unpooled, separate estimates are taken as a starting point or default), but in other settings it’s asking for trouble, and the use of flat priors is basically a way to gratuitously add noise to the inference.

Anyway, back to the example. It seemed to make sense to put a prior on the center of the beta distribution and the amount of information in the beta distribution. These can be specified using mean and variance, but in this case the “effective sample size” seemed reasonable too. To put it another way: you ask, Why not parameterize in terms of the mean and variance? But in general that won’t work either, for example what would you do if you had a Cauchy prior, which has no mean and no variance?

A rule such as “parameterize using the mean and variance” is nothing but a guideline. So, when introducing this example into the book, I didn’t want to try to overly formalize this point. In retrospect, I actually think this was pretty mature of me! But maybe I should’ve explained a bit more. There’s a tradeoff here too: Not enough explanation and things are mysterious; too much explanation and the practical material gets lost in the verbiage (a point of which readers of this blog are well aware, I’m afraid).

3 new priors you can’t do without, for coefficients and variance parameters in multilevel regression

Partha Lahiri writes, in reference to my 2006 paper:

I am interested in finding out a good prior for the regression coefficients and variance components in a multi-level setting. For concreteness, let’s say we have a model like the following:

Level 1: Y_ijk | theta_ij ~(ind) N( theta_ij, sigma^2)
Level 2: theta_ij| mu_i ~(ind) N( mu_i, tau^2)
Level 3: mu_i|beta, ~(ind) N( x_i’beta, delta^2)

One possibility is to assume independent uniform proper priors with large length for beta and the standard deviations sigma, tau and delta. Since all the distributions are proper, the posterior will be proper. I need to check if anyone talked about propriety of posterior if we make these proper uniform to improper uniform (e.g., delta is uniform in (0,\infty)) in order to avoid insensitivity of the results (in the lines of your section 2.2).

Another possibility is to use improper uniform for beta (usually it does not cause problem with impropriety of posterior – but need to check) and independent half Cauchy on the standard deviation (need an appropriate scale for half Cauchy).

My reply:

If you have enough groups you can get away with just about anything, but in the real world you won’t have enough groups so I think it’s best for you to use an informative prior. I recommend weakly or strongly informative priors on sigma, tau, and delta. Maybe I’d call these sigma_y, sigma_theta, and sigma_mu, actually. For sigma_y you’ll have a lot of data so maybe you don’t even need to bother, but you can do it just for completeness. For sigma_theta and, especially, sigma_mu, you’ll want something real.

Here’s a start: make sure your data y and predictors x are all dimensionless or on a unit scale so that differences of 1 are large. Then put half-normal(0,1) priors on the 3 scale parameters (sigma_y, sigma_theta, sigma_mu) and independent normal(0,1) priors on the betas.

Aki would probably recommend t_7 and he’s probably right, but recently I’ve been just using normals. Of course these priors might not make sense for your problem. They’re the default: use them as an anchor or starting-off point.

This is a workshop you can’t miss: DataMeetsViz



This looks like it was a great conference with an all-star lineup of speakers. You can click through and see the talks.

What happened to mortality among 45-54-year-old white non-Hispanics? It declined from 1989 to 1999, increased from 1999 to 2005, and held steady after that.


The raw death rates for the group (which appeared in the Case-Deaton paper) are in red, and the age-adjusted death rates (weighting each year of age equally) are in black.

So . . . the age-adjusted mortality in this group increased by 5% from 1999 to 2005 and has held steady thereafter. But if you look at the raw data you’d be misled into thinking there was a steady increase. That’s the aggregation bias I’ve been talking about here and here.

For some reason it’s not so easy to get the numbers before 1999. But, following Deaton’s tip, I grabbed the 1999-2013 data and made some plots. All are renormalized to be relative to 1999.


Based on my earlier analysis, I’m guessing that age-adjusted mortality in this group dropped pretty dramatically from 1989 to 1999. Hence the title of this post.

The natural next step is to break this one up by men and women, and by ethnic group. And someone should do this. But not me. I got a job, and this ain’t it.

P.S. In the original version of this post I referred to “non-Hispanic white men.” I don’t know why I wrote that. All these graphs are for non-Hispanic whites, both sexes. As noted above, it would be easy enough to do separate calculations for men and women, but I didn’t do that.

Age adjustment mortality update


Earlier today I discussed a paper by Anne Case and Angus Deaton in which they noted an increase in mortality rates among non-Hispanic white Americans from 1989 to 2013, a pattern that stood in sharp contrast to a decrease in several other rich countries and among U.S. Hispanics as well:

Screen Shot 2015-11-05 at 7.53.11 PM

Interpretation of this graph is tricky though, because the “45-54” age group was, on average, younger at the beginning of this time series than at the end, what with the big fat baby boomer generation passing through (see image at top of page). Average age increased from 49.1 in 1989 to 49.7 in 2013. Not a huge increase, but not trivial either given the steady increase in mortality rate as a function of age (approximately 8% per year) among the middle-aged.

I did a quick calculation to estimate what we might expect to happen to the mortality rate in the 45-54 age group, just from the changing age distribution, and here’s what I found:


Based on this analysis, the entire increase in mortality among non-Hispanic white Americans aged 55 in the Case-Deaton graph can be explained by changing age composition. Sociologist Philip Cohen sliced the data in a somewhat different way and estimated that the change in age composition could explain about half of the increase.

As I wrote in my earlier post, the Case-Deaton result is still interesting because of the comparison to other countries (and to Hispanics within the U.S.): these other groups show declines in mortality rates of around 30%, which is much more than could be explained by any age-aggregation artifacts.

Deaton replies

I asked a colleague to point this post to Deaton, and he (Deaton) replied with the following data from the CDC showing deaths per 100,000 among white non-Hispanics in 1999 (not 1989, which was the beginning of the series shown above, but 1999; apparently the pre-1999 data are harder to grab) and 2013:

Age   1999    2013  Change
45   262.3   260.7    -1.6
46   292.9   289.8    -3.1
47   305.9   323.5    17.6
48   337.2   342.9     5.7
49   359.0   384.5    25.5
50   376.7   422.2    45.5
51   429.0   466.1    37.1
52   444.8   481.2    36.4
53   545.1   526.7   -18.4
54   555.3   572.7    17.4

Deaton pointed out that the mortality rate increased among most age groups. And, indeed, the average increase is about 4%.

Deaton also sent this analysis to the New York Times, where David Leonhardt reports:

Breaking down the 45-to-54 age group into single years of age, which should avoid Mr. Gelman’s concern, still shows the same pattern.

“If we want to be more precise about the age range involved, we could say that for all single years of age from 47 to 52, mortality rates are increasing,” wrote Mr. Deaton, the most recent winner of the Nobel Prize in economics. “So the overall increase in mortality is not due to failure to age adjust.” . . .

“We stick by our results,” he said.

According to the table above, mortality rates among non-Hispanic whites aged 45-54 increased by an average of about 4% after controlling for age. But if you go to Case and Deaton’s graph above, you’ll find an increase of about 12% 9% in the raw mortality rate for that group from 1999 (again, not 1989 for this comparison) and 2013.

So according to these calculations, if you correct for the age-composition bias, about 2/3 half of the observed change from 1999 to 2013 goes away. If you look at the top graph above, 1999 appears to be an unusual year so it might not be the best to use as a baseline.

Here, then, is a quick summary of our estimates of the bias from age composition in estimating the recent changes in death rate for non-Hispanic white Americans aged 45-54:

After controlling for age, there was a decline in the death rate from 1989 to 1999, then an increase from 1999 to 2005, then it’s been steady since then. See graphs here.

In my post, I estimated no change because I was considering the entire range, 1989-2013, as presented in the original Case and Deaton paper. In his reply Deaton estimated an increase because he was just looking from 1999-2013. Actually, though, all that increase occurred between 1999 and 2005.


So there appears to have been no aggregate increase in age-adjusted mortality in this group in the 1989-2013 period.

Is it then appropriate to say “We stick by our results”?

In this case I say yes, that Case and Deaton’s main results seem to stand up just fine.

As noted above (and in my earlier post), their key claim was that death rates among middle-aged non-Hispanic whites in the U.S. slightly increased, even while corresponding death rates in other countries declined by about 30%. Even after you apply a bias correction and find that death rates among middle-aged non-Hispanic whites in the U.S. were actually flat (or maybe even decreased slightly), the key comparison to other countries is barely affected. A bias of 5% is small compared to an observed difference of 30%.

And this is why I emphasized throughout that this statistical bias did not invalidate the Case and Deaton study. As a statistician, I am of course interested in such biases, and it wasn’t clear to me ahead of time how large the correction would be. It turned out that the bias explained the observed increase among 45-54-year-old non-Hispanic whites, and that’s interesting, but the cross-national comparison is still there, and that seems to be the most important thing.

P.S. Deaton also asked why I estimated the bias using the age distribution rather than single-year mortality rates. The answer to this question is that I just used the data I found. I have no great familiarity with demographic data and I did not know that the data by ethnicity and year of age were easily available. I agree that the natural thing to do would be to analyze death rates by year of age. If someone can point me to such a dataset, I’d be glad to fit a model to it, indeed this would be an excellent project.

P.P.S. The mortality rates by year of age from 1999 to 2003 are at CDC Wonder, so that’s a start. If anyone knows where the 1989-1998 data are, please let me know.

I agree with Case and Deaton on the main point, for sure: if indeed there was a decrease from 1989 to 1999, and an increase from 1999 to 2005, and no change after that, this is largely consistent with their story of there being a reversal, or at least a stalling of improvement, after decades of progress. And, in any case, the change compared to other countries and groups is huge. Which is a point that I emphasized in all my posts. The existence of a bias does not imply that there is no underlying effect. Indeed, that’s why I wanted to quantify the bias, to get a sense of how it changes one’s conclusions.

P.P.P.S. More graphs here, including this:


Correcting statistical biases in “Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century”: We need to adjust for the increase in average age of people in the 45-54 category

In a much-noticed paper, Anne Case and Angus Deaton write:

This paper documents a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround.

Here’s the key figure:

Screen Shot 2015-11-05 at 7.53.11 PM

I have no idea why they label the lines with three-letter abbreviations when there’s room for the whole country names, but maybe that’s some econ street code thing I don’t know about.

Anyway, the graph is pretty stunning. And for obvious reasons I’m very interested in the mortality of white Americans in the 45-54 age range.

But could this pattern be an artifact of the coarseness of the age category? A commenter here raised this possibility a couple days ago, pointing out that, during the period shown in the above graph (1989 to the present), the 45-54 bin has been getting older as the baby boom has been moving through. So you’d expect an increasing death rate in this window, just from the increase in average age.

How large is this effect? We can make a quick calculation. A blog commenter pointed out this page from the Census Bureau, which contains a file with “Estimates of the Resident Population by Single Year of Age, Sex, Race, and Hispanic Origin for the United States: April 1, 2000 to July 1, 2010.” We can take the columns corresponding to white non-Hispanic men and women. For simplicity I just took the data from Apr 2000 and assumed (falsely, but I think an ok approximation for this quick analysis) that this age distribution translates by year. So, for example, if we want people in the 45-54 age range in 1990, we take the people who are 55-64 in 2000.

If you take these numbers, you can compute the average age of people in the 45-54 age group during the period covered by Case and Deaton, and this average age does creep up, starting at 49.1 in 1989 and ending up at 49.7 in 2013. So the increase has been about .6 years of age.

How does this translate into life expectancy? We can look up the life table at this Social Security website. At age 45, Pr(death) is .003244 for men and .002069 for women. At age 54, it’s .007222 for men and .004301 for women. So, in one year of age, Pr(death) is multiplied by approximately a factor of (.007222/.003244)^.1 = 1.08 for men and (.004301/.002069)^.1 = 1.08 for women—that is, an increase in Pr(death) of 8% per year of age.

The above calculations are only approximate because they’re using life tables for 2011, and for the correct analysis you’d want to use the life table for each year in the study. But I’m guessing it’s close enough.

To continue . . . in the period graphed by Case and Deaton, average age increases by about half a year, so we’d expect Pr(death) to increase by about .6*8%, or about 5%, in the 45-54 age group, just from the increase of average age within the cohort as the baby boom has passed through.

Doing the calculation a bit more carefully using year-by-year mortality rates, we get this estimate of how much we’d expect death rates in the 45-54 age range to increase, just based on the increase in average age as the baby boom passes through:


This is actually not so different from the “US Whites” line in the Case-Deaton graph shown above: a slight decrease followed by a steady increase, with a net increase in death rate of about 5% for this group. Not identical—the low point in the actual data occurs around 1998, whereas the low point is 1993 in my explain-it-all-by-changes-in-age-composition graph—but similar, both in the general pattern and in the size of the increase over time.

But Case and Deaton also see a dramatic drop in death rates for other countries (and for U.S. Hispanics), declines of about 30%. When compared to these 30% drops, a bias of 5% due to increasing average age in the cohort is pretty minor.


According to my quick calculations, the Case and Deaton estimates are biased because they don’t account for the increase in average age of the 45-54 bin during the period they study. After we correct for this bias, we no longer find an increase in mortality among whites in this category. Instead, the curve is flat.

So I don’t really buy the following statement by Case and Deaton:

If the white mortality rate for ages 45−54 had held at their 1998 value, 96,000 deaths would have been avoided from 1999–2013, 7,000 in 2013 alone. If it had continued to decline at its previous (1979‒1998) rate, half a million deaths would have been avoided in the period 1999‒2013.

According to my above calculation, the observed increase in death rate in the 45-54 cohort is roughly consistent with a constant white mortality rate for each year of age. So I think it’s misleading to imply that there were all these extra deaths.

However, Case and Deaton find dramatic decreases in mortality rates in other rich countries, decreases on the order of 30%. So, even after we revise their original claim that death rates for 45-54’s are going up, it’s still noteworthy that they haven’t sharply declined in the U.S., given what’s happened elsewhere.

So, one could rewrite the Case and Deaton abstract to something like this:

This paper documents a marked increase flattening in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed ended decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround stasis.

Still newsworthy.

P.S. Along similar lines, I’m not quite sure how to interpret Case and Deaton’s comparisons across education categories (no college; some college; college degree), partly because I’m not clear on why they used this particular binning but also because the composition of the categories have changed during the period under study. The group of 45-54-year-olds in 1999 with no college degree is different from the corresponding group in 2013, so it’s not exactly clear to me what is learned by comparing these groups. I’m not saying the comparison is meaningless, just that the interpretation is not so clear.

P.P.S. See here for a response to some comments by Deaton.

P.P.P.S. And still more here.

4 for 4.0 — The Latest JAGS

This post is by Bob Carpenter.

I just saw over on Martyn Plummer’s JAGS News blog that JAGS 4.0 is out. Martyn provided a series of blog posts highlighting the new features:

1. Reproducibility: Examples will now be fully reproducible draw-for-draw and chain-for-chain with the same seed. (Of course, compiler, optimization level, platform, CPU, and OS can also affect numeric computations.) They also added unit testing. (How does anyone develop anything this complex without tests? I’d be lost.)

2. Better Error Messages: Examples with undefined array elements or directed cycles get flagged as such.

3. More R-like Features: This includes some nice variable arg length sum and product functions, but what really caught my eye is allowing integer arrays as indexes R-style (and as loop “bounds” as in R). It makes writing hierarchical models very neat. I don’t like that JAGS now allows the equality sign (=) for assignment—multiple ways to do things can be confusing for people reading the code, though this case is mostly harmless. I’ve always worried about efficiency in using arrays for loop bounds, but my worry’s probably misplaced.

4. Easter Eggs: This is straight from Martyn’s blog post:

One motivation for writing these blog posts was to draw users’ attention to new features that I wanted people to be aware of, even though they are not documented. There are other features – new distributions and samplers – that are currently undocumented and hence hidden. These will miraculously appear as “new” features as they are documented during the JAGS 4.x.y release series.

There’s always the source code!

Why Retraction Watch remains necessary

A few months ago Psych Science issued a press release, “Blue and Seeing Blue: Sadness May Impair Color Perception,” promoting a fatally flawed paper that appeared in their journal. I heard about this paper from Nick Brown, and we slammed it on the blog.

As I wrote at the time, I have nothing against the authors of the paper in question. I expect they’re doing their best. It’s not their fault that (a) statistical methods are what they are, (b) statistical training is what is is, and (c) the editors of Psychological Science don’t know any better. It’s all too bad, but it’s not their fault. I laugh at these studies because I’m too exhausted to cry, that’s all. And, before you feel too sorry for these guys or for the editors of Psychological Science or think I’m picking on them, remember: if they didn’t want the attention, they didn’t need to publish this work in the highest-profile journal of their field. If you put your ideas out there, you have to expect (ideally, hope) that people will point out what you did wrong.

I’m honestly surprised that Psychological Science is still publishing this sort of thing. They’re really living up to their rep, and not in a good way. PPNAS I can expect will publish just about anything, as it’s not peer-reviewed in the usual way. But Psych Science is supposed to be a real journal, and I’d expect, or at least hope, better from them.

The good news and the bad news

The good news comes from a commenter, who reports that Psych Science just retracted the paper:

Screen Shot 2015-11-05 at 12.32.34 PM

The authors still express what I view as naive and unrealistic hopes:

We will conduct a revised Experiment 2 that more directly tests the motivational interpretation and improves the assessment of BY accuracy. If this revised experiment yields the same findings as our original Experiment 2, we will seek publication of our original Experiment 1 with the new Experiment 2. We remain confident in the proposition that sadness impairs color perception, but would like to acquire clearer evidence before making this conclusion in a journal the caliber of Psychological Science.

I think they don’t fully understand how difficult it is to learn from noisy data. But I’m glad they retracted. And I can hardly blame them for still holding out hope in their hypothesis.

The bad news is that Psych Science has not yet promoted the retraction at the same level as it promoted the original claim.

What I’d like to see from them is a feature story, titled something like, “Blue and Seeing Blue: Desire for Publication May Impair Research Effectiveness.” Instead, though, this is what I see on their webpage:

Screen Shot 2015-11-05 at 12.38.01 PM

They did tweet the retraction, so that’s something:

Screen Shot 2015-11-05 at 12.39.01 PM

And they retracted the old press release. But I really think they should give the retraction the same publicity they gave to the original report.

Again, no shame on the researchers involved. They made a mistake, something that happens all the time as is no surprise given the null hypothesis significance testing approach in which researchers are trained. I make statistical mistakes all the time, so I’m not surprised that others do too. Post-publication peer review is a great way to catch such errors, and increased awareness of the problems with noisy studies may be a way to reduce such errors in the future.

P.S. More on the story at Retraction Watch.

Econometrics: Instrument locally, extrapolate globally

Rajeev Dehejia sends along two papers, one with James Bisbee, Cristian Pop-Eleches, and Cyrus Samii on extrapolating estimated local average treatment effects to new settings, and one with Cristian Pop-Eleches and Cyrus Samii on external validity in natural experiments. This is important stuff, and they work it out in real examples.

Hey—looky here! This business wants to hire a Stan expert for decision making.


Kevin Van Horn writes:

I currently work in a business analytics group at Symantec, and we have several positions to fill. I’d like at least one of those positions to be filled by someone who understands Bayesian modeling and is comfortable using R (or Python) and Stan (or other MCMC tools). The team’s purpose is to maximize revenue growth by using data and analytics to advise decision makers in key areas such as sales, marketing, and finance. The position involves identifying and modeling business processes and their corresponding data flows, and recommending improvements.

If you think you might be interested, reply to me directly (Kevin_VanHorn … at … symantec … dot … com).

The tabloids strike again

Under the heading, “Unlearning implicit social biases during sleep,” Nick Brown writes:

What do you make of this?

At first sight I’m unimpressed; it looks like just another glamour journal fluff piece. For example, it seems to me that Figure 1F commits the error described here; and the authors seem to ignore the large increase (regression to the mean) in the second column (of 4) between Figures 1D and 1E. But maybe I’m being too instantly skeptical, in what I suppose may come to be known as “LaCour month”.

I replied: Wow—the tabloids strike again! What made you look at this article in the first place?

And Nick responded:

It was the #3 item in the “World News” section of the BBC app a couple of days ago. Not the Science section, or even the Health section under which they filed it, but apparently the third most important piece of news in the world. “FFS”, as the kids say (or maybe that’s the UK only, and “WTF” is the international English version).

All the tabloid-y discussion was about ethics, brainwashing, brave new world, etc. To me it looks like yet another study which is just “obviously wrong” (insufficient power, etc), even before I read it.

Nick then blogged it, under the heading, “Dream on: Playing pinball in your sleep does not make you a better person.”

But, hey, it was a net win for the journal Science: the BBC listed their article as the third-most-important piece of news in the world. And, unlike with LaCour and Green, the data were real. What more could you ask for??

P.S. I’m thinking that a better title for this post would be “Unlearning common sense during research.”

“Another reminder that David Brooks is very good at being David Brooks”

Outsourcing this one to Palko.

Neuroscience research in Baltimore

Joshua Vogelstein sends along these ads for students, research associates, and postdocs in his lab at Johns Hopkins University:
Continue reading ‘Neuroscience research in Baltimore’ »