Skip to content

“P.S. Is anyone working on hierarchical survival models?”

Someone who wishes to remain anonymous writes:

I’m working on building a predictive model (not causal) of the onset of diabetes mellitus using electronic medical records from a semi-panel of HMO patients. The dependent variable is blood glucose level. The unit of analysis is the patient visit to a network doctor or hospitalization in a network hospital aggregated to the month-year level. The time frame is from the early 80s to the present. Since my focus is on the onset of the disease, my approach is agnostic and prospective. I would like to derive data-driven answers to questions of co-morbidity, patient health and wellness based on physical measures such as BMI or BP as well as physician and hospital quality as an inherent part of the model output.

To me, addressing these issues with data of this type would require multiple models for full coverage:

1) A survival model to capture censoring and time to disease onset

– Censoring can have multiple causes: diagnosed with diabetes type 1 or 2, lost to followup, death, etc

2) Multiple hierarchical bayesian models for massively categorical variables such as patient, diagnosis, doctor, hospital to capture the differing dependence structures

– Patient within zipcode, community, county, state to capture the social determinants of health

– Patient within a family network, e.g., children, siblings, parents, etc., to reflect familial history of disease

– Patient and diagnoses received — thousands of possible diagnoses which collapse into higher levels

– Patient within HMO doctor and hospital network

– Doctor within specialty — probably 70 or so specialties overall

– Doctor within zipcode, community, county, state

– Hospital within zipcode, community, county, state

3) As available, the impact of programs and interventions designed to promote wellness, mitigate or prevent disease…these could include recommendations regarding exercise, diet, etc.

4) Given the wide time frame, macro-economic indexes to capture the well-known impact of the business cycle on the determinants of medically-related activities

These are preliminary thoughts as I have not yet begun the process of testing the need for specifying all of these hierarchies since I am still in the initial stages of the analysis. Just getting this data lined up and talking together is a significant challenge in and of itself.

My question for you concerns the need for multiple models when the dependence structures overlap and are as messy as in the present case. I’m sure you’re going to advise against such a wide-ranging predictive design, enjoining me to greater research focus and specificity. My preference is to retain an expansive and exploratory stance and not to simplify the in-going hypotheses just for the sake of the modeling. Honestly, I think that there is already too much specificity in the literature which does little or nothing to uncover and identify the broad antecedents of this illness.

What do you think? Am I missing something? Suggestions?

P.S. Is anyone working on hierarchical survival models?

My reply: It does sound kind of appealing to just throw everything into the model and let Stan sort it out. On the other hand, it also seems like the “throw it all in at once” strategy is a recipe for confusion, and it could be hard to interpret the results. So let me give you the generic suggestion that, whatever model you start with, you check it out using fake-data simulation (that is, simulate fake data from the model, then fit the model and check that you can recover the parameters of interest and make good predictions). And I’d suggest starting simple and working up from there. Ultimately I think a more complex model is better and should be more believable, but you might have to work up to it, because of challenges of computation, identification, and understanding of the model.

P.S. Matt Gribble adds:

I wanted to plug some exciting work by Michael Crowther extending generalized gamma regression to have random effects not only on the log-hazards scale (frailty models) but also on the log-relative median survival time scale. The paper’s in press at Statistics in Medicine, and I had nothing to do with it but can’t wait to cite/use it.

Not quite what I plugged (I was basing the plug about survival time random effects on slides I saw of his, not on the actual paper) but I think this ref is still cutting-edge stuff in the theme of hierarchical survival models.

Just wondering


It would be bad news if a student in the class of Laurence Tribe or Alan Dershowitz or Ian Ayres or Edward Wegman or Matthew Whitaker or Karl Weick or Frank Fischer were to hand in an assignment that is obviously plagiarized copied from another source without attribution. Would the prof have the chutzpah to fail the student, or would he just give the student an A out of fear that the student would raise a ruckus if anything were done about it?

But it would be really bad news if everyone in the class were to do this. For example, suppose the students were to do the work for real—that is, individually write their own, non-plagiarized papers and put them in a file somewhere—and then make alternative, plagiarized papers to hand in. Perhaps, just to make sure the prof or the overworked teaching assistant doesn’t miss it, the students would even cite the wikipedia entries they’re copying from. Then they’d sit back and wait and see what happens. It would be important, though, that the students write actual papers on their own, because otherwise they’d be missing out on the chance to learn the material, which after all is the real purpose of taking the course.

In any case, this would be a bad situation. It’s not clear how the prof would have the moral authority to fail a student for an offense that he, the professor, had committed without suffering any penalty. But I wouldn’t recommend the students try it. They might just get expelled anyway for the combined violations of plagiarism and embarrassing the university.

It’s like smoking crack in Toronto. It’s still illegal even if the boss does it.

“Bayes Data Analysis – Author Needed”

The following item came in over the Bayes email list:


My name is Jo Fitzpatrick and I work as an Acquisition Editor for Packt Publishing ( ). We recently commissioned a book on Bayesian Data Analysis and I’m currently searching for an author to write this book. You need to have good working knowledge of Bayes and a good level of written English. Please email for more details.


Joanne Fitzpatrick

Acquisition Editor
[ Packt Publishing ]

Hey, I think I’m qualified for this! Although maybe not the “good level of written English” bit, as I only speak American. . . . In any case, I am happy to see that the term “Bayesian Data Analysis” has become generic.

On deck this week

Mon: “Bayes Data Analysis – Author Needed”

Tues: Just wondering

Wed: “P.S. Is anyone working on hierarchical survival models?”

Thurs: Open-source tools for running online field experiments

Fri: Hey—this is a new kind of spam!

Sat, Sun: As Chris Hedges would say: That’s the news, and I am outta here!

Visualizing sampling error and dynamic graphics

Robert Grant writes:

What do you think of this visualisation from the NYT [in an article by Neil Irwin and Kevin Quealy but I'm not sure if they're the designers of the visualization]? I’m pretty impressed as a method of showing sampling error to a general audience!

I agree.

P.S. In related news, Antony Unwin writes:

A couple of weeks ago you had a discussion on graphics on your blog and it seemed to me that people had very different ideas about what the term “Interactive Graphics” means. For some it is about interacting with presentation graphics on the web, for others it is about using interactive graphics to do data analysis. You really need to see interactive graphics in action to get a feel for it.

I have made a ten minute film to give the flavour of interactive graphics for data analysis with data on last year’s Tour de France and using Martin Theus’s software Mondrian.

Grand Opening: The Stan Shop

I finally put together a shop so everyone can order Stan t-shirts and mugs:

The art’s by Michael Malecki. The t-shirts and mugs are printed on demand by Spreadshirt. I tried out a sample and the results are great and have held up to machine washing and drying.

There’s a markup of about $4 per item, which is going straight into the Stan slush fund. No promises that it will be spent wisely, but it will go to the developers.

There aren’t a lot of other products from Spreadshirt that we can put logos on — most of the items (hats, tote bags, etc.) are text-only. But if there are other t-shirts or sweatshirts people want, we could easily expand our product line — feel free to drop suggestions in the comment box.

Dimensionless analysis as applied to swimming!

We have no fireworks-related posts for July 4th but at least we have an item that’s appropriate for the summer weather. It comes from Daniel Lakeland, who writes:

Recently in one of your blog posts (“priors I don’t believe”) there was a discussion in which I was advocating the use of dimensional analysis and dimensionless groups to normalize every model, including statistical regression models. People wanted an example that made sense in social sciences, but since I don’t really have any social science examples to hand, I couldn’t provide one. However, I do have an interesting example problem that I just blogged, although it’s a physics problem, perhaps it illustrates the techniques well enough that one of your readers could run with it on a social sciences problem, or alternatively perhaps one of your readers would be interested in collaborating with me to develop a social science example.

Lakeland’s post starts with:

I’ve been working on my front-crawl swim stroke as an effort to improve my fitness.

But it doesn’t take too long to get to this:

Screen Shot 2014-05-17 at 10.03.59 PM

and this:

Screen Shot 2014-05-17 at 10.04.35 PM

So you should enjoy it. This indeed is the sort of analysis we don’t see enough of in statistics, I think.

P.S. Regarding priors, I wanted also to feature this comment from Chris on that earlier post:

This non-technical description of Bayes’ rule bugged me: “Final opinion on headline = (initial gut feeling) * (study support for headline)”. I think I’d have written something like “educated guess” rather than “gut feeling”. Not all gut feelings are of equal merit. One can have gut feelings about technical matters where they have some experience and are reasonably well informed. (I’d call that an educated guess.) Conversely, one can have gut feelings re matters where they are thoroughly uninformed.

I agree. One problem with the whole “subjective Bayes” slogan is the elevation of subjectivity into a principle, which is all too close to an anything-goes view which is contrary to many of the goals of science.

“The great advantage of the model-based over the ad hoc approach, it seems to me, is that at any given time we know what we are doing.”

The quote is from George Box, 1979.

And this:

Please can Data Analysts get themselves together again and become whole Statisticians before it is too late? Before they, their employers, and their clients forget the other equally important parts of the job statisticians should be doing, such as designing investigations and building models?

I actually think the current term “data scientist” is an improvement over “data analyst” because the scientist can be involved in data collection and decision making, not just analysis.

Box also wrote:

It is widely recognized that the advancement of learning does not proceed by conjecture alone, nor by observation alone, but by an iteration involving both. Certainly, scientific investigation proceeds by such iteration. Examination of empirical data inspires a tentative explanation which, when further exposed to reality, may lead to its modification. . . .

Now, since scientific advance, to which all statisticians must accommodate, takes place by the alternation of two different kinds of reasoning, we would expect also that two different kinds of inferential process would be re- quired to put it into effect.

The first, used in estimating parameters from data conditional on the truth of some tentative model, is appropriately called Estimation. The second, used in checking whether, in the light of the data, any model of the kind proposed is plausible, has been aptly named by Cuthbert Daniel Criticism.

Box continued:

While estimation should, I believe, employ Bayes’ Theorem, or (for the fainthearted) likelihood, criticism needs a different approach. In practice, it is often best done in a rather informal way by examination of residuals or other suitable functions of the data. However, when it is done formally, using tests of goodness of fit, it must, I think, employ sampling theory for its justification.

He was writing in 1978, back before people realized the ways in which model criticism, exploratory data analysis, and sampling theory could be incorporated into Bayesian data analysis (see chapters 6-8 of BDA3 for a review).

“Being an informed Bayesian: Assessing prior informativeness and prior–likelihood conflict”

Xiao-Li Meng sends along this paper (coauthored with Matthew Reimherr and Dan Nicolae), which begins:

Dramatically expanded routine adoption of the Bayesian approach has substantially increased the need to assess both the confirmatory and contradictory information in our prior distribution with regard to the information provided by our likelihood function. We propose a diagnostic approach that starts with the familiar posterior matching method. For a given likelihood model, we identify the difference in information needed to form two likelihood functions that, when combined respectively with a given prior and a baseline prior, will lead to the same posterior uncertainty. In cases with independent, identically distributed samples, sample size is the natural measure of information, and this difference can be viewed as the prior data size M(k), with regard to a likelihood function based on k observations. When there is no detectable prior-likelihood conflict relative to the baseline, M(k) is roughly constant over k, a constant that captures the confirmatory information. Otherwise M(k) tends to decrease with k because the contradictory prior detracts information from the likelihood function. In the case of extreme contradiction, M(k)/k will approach its lower bound −1, representing a complete cancelation of prior and likelihood information due to conflict. We also report an intriguing super-informative phenomenon where the prior effectively gains an extra (1+r)^(−1) percent of prior data size relative to its nominal size when the prior mean coincides with the truth, where r is the percentage of the nominal prior data size relative to the total data size underlying the posterior. We demonstrate our method via several examples, including an application exploring the effect of immunoglobulin levels on lupus nephritis. We also provide a theoretical foundation of our method for virtually all likelihood-prior pairs that possess asymptotic conjugacy.

This sound like it’s potentially very important. As many of you know, I’ve been struggling for a few years with how to generally think about weakly informative priors, and this paper represents a way of looking at the problem from a different direction.

This area also is an example of the complementary nature of applied, methodological, computational, and theoretical research. Our methodological work on weakly informative priors (that is, our papers from 2006, 2008, 2013, and 2014) were motivated by persistent problems that were arising in our applied work on hierarchical models. Then, once we have these methods out there, it is possible for deep thinkers such as XL to make sense of them all. Then people can use apply larger framework to new applications, and so on.

P.S. I spoke on weakly informative priors at Harvard a few years ago (see here for a more recent version). When the talk was over, XL stood up and said, “Thank you for a weakly informative talk.” So I’m hoping the paper above gets published and I can write a discussion beginning, “Thank you for a weakly informative paper.”

“Who’s bigger”—the new book that ranks every human on Wikipedia—is more like Bill Simmons than Bill James

I received a copy of “Who’s Bigger?: Where Historical Figures Really Rank,” by Steven Skiena, a computer scientist at Stony Brook University, and Charles Ward, and engineer at Google. Here’s the blurb I gave the publisher:

Skiena and Ward provide a numerical ranking for the every Wikipedia resident who’s ever lived. What a great idea! This book is a guaranteed argument-starter. I found something to argue with on nearly every page.

Here’s an argument for you. Their method ranks obscure U.S. president Chester Arthur as the 499th most historically significant figure who has ever lived. William Henry Harrison, who was president for one month, is listed as the 288th most significant person. This seems ridiculous to me. We’re considering all people who have ever lived (who are on Wikipedia), including inventors of drugs, discoverers of physical laws, founders of countries, influential religious leaders, explorers, authors, musicians, newspaper editors, etc etc. Surely there are many thousands of people who are more historically significant than these two minor figures who just happened to be president of the United States for brief periods.

How did this happen? I can’t be sure, but think about this: Right now, as you read this, there’s probably a 7th grader somewhere who’s googling Chester Arthur as part of a class project on American presidents (and is not allowed to write about Washington, Jefferson, or Lincoln). And somewhere else there’s a 5th-grade teacher who has assigned a different president to each of the students in his or her class. These kids are all going on Wikipedia, and lots of other pages link to these presidents’ pages.

In his historical baseball abstract, Bill James ranks approximately 1000 of the best baseball players. The ranking is, of course, ultimately subjective (in that James has to make choices in what data sources to use and how to combine them) but it has a clearly-defined goal: wins. James is rating players based on how many wins they contributed to their team. In his historical basketball book, Bill Simmons ranks approximately 100 of the best pro basketball players. Simmons, like James, is entertaining, intelligent, and thought-provoking, but one thing his ratings don’t have is a clear external goal. One could say that James is ranking based on an (imperfect) estimate of a somewhat well-defined outcome, whereas Simmons is ranking for the sake of ranking. As with the notorious “U.S. News” college rankings, the ratings are not an estimate of anything but themselves. That’s fine, it’s just they way it is, I’m not trying to slam Bill Simmons, I’m just trying to make this distinction.

Skiena and Ward, like Simmons, are producing a ranking that is not an estimate of any externally-defined goal. Again, that’s fine—it’s all one could possibly do here, I think.

But . . . I found it grating how Skiena and Ward kept on referring to their rankings as “long-term historical significance,” “historical greatness,” etc. They write, “Our algorithmic rankings help focus attention where it should properly be focused. Any person interested in livestock production would miss the point if they study how to raise geese and Angora rabbits at the expense of cows, pigs, and chickens. Our rankings help answer the question ‘Where’s the beef?’ in history in a rigorous and effective way.” I don’t think Chester Arthur is very beefy at all, but I do believe that he gets lots of links because the U.S. educational system is so focused on presidents.

The part I like is when they write, “what we are really trying to do here is study what shapes the process of historical recollection.”

In summary, I have mixed feelings about this book. I like the idea of quantitatively studying the process of historical recollection. But it’s hard for me to take seriously any method that represents Grover Cleveland as the 98nd most historically significant figure who’s ever lived.