Skip to content



Logo design by Michael Betancourt and Stephanie Mannheim.

P.S. Some commenters suggested the top of the S above is too large, but I wonder if that’s just because I’ve posted the logo in a large format. On the screen it would typically be smaller, something like this, which appears a bit more tasteful:


A question about race based stratification


Can Candan writes:

I have scraped horse racing data from a web site in Turkey and would like to try some models for predicting the finishing positions of future races, what models would you suggest for that?

There is one recent paper on the subject that seems promising, which claims to change the SMO algorithm of support vector regression to work with race based stratification, but no details given, I don’t understand what to modify with SMO algorithm.

This builds on the above one and improves with NDCG based model tuning of least squares SVR.

There’s a conditional logistic regression approach which I tried to implement, but I couldn’t get the claimed improvement over the public odds of winning, may be I’m doing something wrong here.

I’m quite comfortable with R any books, pointers, code snippets are greatly appreciated.

My reply: Sorry, this one is too far away from my areas of expertise!

Hey, what’s up with that x-axis??

Screen Shot 2015-04-20 at 1.17.52 PM

CDC should know better.

P.S. In comments, Zachary David supplies this correctly-scaled version:


It would be better to label the lines directly than to use a legend, and the y-axis is off by a factor of 100, but I can hardly complain given that he just whipped this graph up for us.

The real point is that, once the x-axis is scaled correctly, the shapes of the curves change! So that original graph really was misleading, in that it incorrectly implies a ramping up in the 3-10 year range.

P.P.S. Zachary David sent me an improved version:


Ideally the line labels would be colored so there’d be no need for the legend at all, but at this point I really shouldn’t be complaining.

On deck this week

Mon: Hey, what’s up with that x-axis??

Tues: A question about race based stratification

Wed: Our new column in the Daily Beast

Thurs: Irwin Shaw: “I might mistrust intellectuals, but I’d mistrust nonintellectuals even more.”

Fri: An amusing window into folk genetics

Sat: “Faith means belief in something concerning which doubt is theoretically possible.” — William James (again)

Sun: Interpreting posterior probabilities in the context of weakly informative priors

“When more data steer us wrong: replications with the wrong dependent measure perpetuate erroneous conclusions”

Evan Heit sent in this article with Caren Rotello and Chad Dubé:

There is a replication crisis in science, to which psychological research has not been immune: Many effects have proven uncomfortably difficult to reproduce. Although the reliability of data is a serious concern, we argue that there is a deeper and more insidious problem in the field: the persistent and dramatic misinterpretation of empirical results that replicate easily and consistently. Using a series of four highly studied “textbook” examples from different research domains (eyewitness memory, deductive reasoning, social psychology, and child welfare), we show how simple unrecognized incompatibilities among dependent measures, analysis tools, and the properties of data can lead to fundamental interpretive errors. These errors, which are not reduced by additional data collection, may lead to misguided research efforts and policy recommendations. We conclude with a set of recommended strategies and research tools to reduce the probability of these persistent and largely unrecognized errors. The use of receiver operating characteristic (ROC) curves is highlighted as one such recommendation.

I haven’t had a chance to look at this but it seems like it could be relevant to some of our discussion. Just speaking generally, I like their focus on measurement.

Statistics Be


This modern statistics got me confused,
To tell you friends I’m quite unenthused.
This modern statistics got me confused,
To tell you friends I’m quite unenthused.

I like Pee Wee Fisher or the great Jerzy
But can’t make head nor tail of this Robby Tibsh’rani

With his
Oop-pop-a-dee-de-doom ah-ah!

Robby Tibsh’rani is the creator
Of this new style along in co with Hastie Trevor,
David Donoho, Efron Bradley –
They all indulge in this monstrosity

They take a random sample and a leave-one-out
Two ool-ya-koos and a half no doubt

Oop-pop-a-dee-de-doom ah-ah!

The bootstrappers you see around
They all converse in a special tongue
Oracle and eel-ya-da
One means hello, the other ta ta.

They call a man a cat and a girl a chick,
And they’re up to all kinds of nonparametrics

With their
Oop-pop-a-dee-de-doom ah-ah!

In conclusion I must now say
The Stanford boys they know how to play
But that music is not for me
So take it back Mister Tibsh’rani

You better take it back to Sand Hill Road
With your high speed riffs and stacatto code

And your
Oop-pop-a-dee-de-doom ah-ah!

In which a complete stranger offers me a bet

Piotr Mitros wrote to Deb and me:

I read, with pleasure, your article about the impossibility of biasing a coin. I’m curious as to whether researchers believe what they write. Would you be willing to place some form of iterated bet?

For example: I provide a two-sided coin and a table. The table looks like a normal table, and the coin looks like a normal coin (although most likely, someone handling the coin will be able to tell it is doctored — if this is a problem, we could probably even do away with this requirement).

The coin is tossed into the air, a large distance, allowed to spin many times, and lands. You were concerned about fairness of the toss. You are welcome to hire an undergrad or similar to do the tossing, assuming you provide unbiased instructions. Heads, I collect $99. Tails, you collect $101. If the coin is fair — as your article claims it has to be — the expected outcome is you make a buck on each toss. It is a random process, so we can keep going for e.g. 40,000 flips — which should give you confidence of coming out ahead, again, assuming you believe your article. It is also about $2000 at undergrad income assuming 10 seconds per flip.

We both do our best to act in good faith. In other words, we don’t e.g. miscount flips, give bad instructions, welch [hey, that’s a racial slur!—ed.], swap coins, etc. The bet is on whether coins can, indeed, be biased.

I replied:

The problem is that the outcome can depend on how the coin is flipped, conditional on the initial state of the coin (i.e., which side is facing up when it is flipped). I have no doubt that a person can flip it to make it more likely to come up one side or the other, but then the bias is a property of the flip, not the coin.


In the proposed protocol, I offered to have you hire the person doing the flipping, and give them clear, good-faith instructions on how to make the flip maximally unbiased. I assume this would mean you would alternate starting side to remove bias for initial state, toss high up into the air, and spin quickly around the appropriate axis.


Yes, in this case I think the probability is .5 and so I don’t think it would make sense for you to offer $101 for tails and $99 for heads, as this bet would have negative expectation.


I appreciate your advice and concern. As a statistician, you are surely aware that humans are not rational economic agents; I may have any of a range of motives.

If you believe what you just said (and your paper), you should also believe that the experiment overall has a $40,000 expected income for you, and that furthermore, the number of flips is sufficient that it would earn you money with pretty good confidence. Indeed, the odds of any large losses should be exceeding small. So you should accept the offer.


Hi, don’t take this personally, but I’m not in the habit of trying to take $40,000 on a bet from someone I don’t know!


It’s not a bad habit. You’re likely to lose more money than you are to make money.

If it would help:

I’m confident I could find a number of mutual acquaintances who would be glad to make an introduction and vouch for my character. We are in similar enough circles that I am sure we have many colleagues in common.

Furthermore, I’d be glad to place some funds in a credible escrow service, so you could be confident of pay-out, and have this managed by a credible, independent third party.

If I were to come out ahead, I’d be glad to earmark the money for a good cause. I would be glad to place the money there, so I would not personally benefit either way.

In other words, I am glad to take every action needed for you to have complete confidence in the validity of the bet. Furthermore, I am a sufficiently public figure that if I were to engage in anything unfair (beyond providing a biased coin, which I am completely open about) the cost to my reputation would be much greater than $40,000.

If you do decline, I would appreciate a better excuse, however!


There is this famous passage which pretty much sums up my attitude:

Screen Shot 2015-04-16 at 9.38.10 PM


By the way, as you can imagine, my expectation was that you would not take the bet (however, if you had, I would have found a way to make good on the offer). I was hunting for something juicy to toss into a followup article.

All good, then.

You can crush us, you can bruise us, yes, even shoot us, but oh—not a pie chart!

Byron Gajewski pointed me to this several-years-old article from the Onion, which begins:

According to a groundbreaking new study published Monday in The Journal Of The American Statistical Association, somewhere on the planet someone is totally doing it at this very moment.

“Of the 6.7 billion inhabitants of Earth, approximately 3.5 billion have reached sexual maturity,” said Dr. Jerome Carver, a mathematics professor at the University of Chicago and lead author of the study. “From a statistical perspective, it simply stands to reason that at least two of these inhabitants are totally going at it right now. Like, as we speak.”

“But it’s probably way more than that,” Carver added. “Like at least a hundred.”

The multidiscipline study, which tapped leading experts in several fields, including reproduction and population sciences, found overwhelming evidence that there is never even a second when someone is not doing it.

An analysis of the data, based on a new statistical model referred to as “Rauchembauer’s Overlap,” indicates that, given the sheer number of people in the world, by the time the first set of people is done doing it, someone else has already begun getting it on.

In addition, the findings suggest that there is a “good, to very good” chance that someone is doing it close by.

“The nearer you get to major metropolitan areas, the more likely you are to be in proximity to those making it,” said California Institute of Technology probability theorist Howard Bergsson, who contributed to the report. “For example, we’re in Chicago, a city of three million people. Someone is probably doing it right down the street, or maybe even somewhere in this building.”

Very well, done, but . . . (a) please don’t lead a stats story with a quote from a math professor, and (b) was this really necessary:


Area newspaper mocks statistics, indeed.

Born-open data


Jeff Rouder writes:

Although many researchers agree that scientific data should be open to scrutiny to ferret out poor analyses and outright fraud, most raw data sets are not available on demand. There are many reasons researchers do not open their data, and one is technical. It is often time consuming to prepare and archive data. In response my [Rouder’s] lab has automated the process such that our data are archived the night they are created without any human approval or action. All data are versioned, logged, time stamped, and uploaded including aborted runs and data from pilot subjects. The archive is GitHub, the world’s largest collection of open-source materials. Data archived in this manner are called born open.

Rouder continues:

Psychological science is beset by a methodological crisis in which many researchers believe there are widespread and systemic problems in the way researchers produce, evaluate, and report knowledge. . . . This methodological crisis has spurred many proposals for improvement including an increased consideration of replicability (Nosek, Spies, & Motyl, 2012), a focus on the philosophy and statistics underlying inference (Cumming, 2014; Morey, Romeijn, & Rouder, 2013), and an emphasis on what is now termed open science, which can be summarized as the practice of making research as transparent as possible.

And here’s the crux:

Open data, unfortunately, seems to be paradox of sorts. On one hand, many researchers I encounter are committed to the concept of open data. Most of us believe that one of the defining features of science is that all aspects of the research endeavor should be open to peer scrutiny. We live this sentiment almost daily in the context of peer review where our scholarship and the logic of our arguments is under intense scrutiny.

On the other hand, surprisingly, very few of our collective data are open!

Say it again, brother:

Consider all the data that is behind the corpus of research articles in psychology. Now consider the percentage is available to you right now on demand. It is negligible. This is the open-data paradox—a pervasive intellectual commitment to open data with almost no follow through whatsoever.

What about current practice?

Many of my colleagues practice what I [Rouder] call data-on-request. They claim that if you drop them a line, they will gladly send you their data. Data-on-request should not be confused with open data, which is the availability of data without any request whatsoever. Many of these same colleagues may argue that data-on-request is sufficient, but they are demonstrably wrong.

No kidding.

Here’s one of my experiences with data-on-request:

Last year, around the time that Eric Loken and I were wrapping up our garden-of-forking-paths paper, I was contacted by Jessica Tracy, one of the authors of that ovulating-women-wear-red study which was one of several examples discussed in our article. Tracy wanted to let us know about some more research she and her collaborator, Alec Beall, had been doing, and she also wanted to us to tell her where our paper would be published so that she and Beall would have a chance to contact the editors of our article before publication. I posted Tracy and Beall’s comments, along with my responses, on this blog. But I did not see the necessity for them to be involved in the editorial process of our article (nor, for that matter, did I see such a role for Daryl Bem or any of the authors of the other work discussed therein). In the context of our back-and-forth, I asked Tracy if she could send us the raw data from her experiments. Or, better still, if she could just post her data on the web for all to see. She replied that, since we would not give her the prepublication information on our article, she would not share her data.

I guess the Solomon-like compromise would’ve been to saw the dataset in half.

Just to clarify: Tracy and Beall are free to do whatever they want. I know of no legal obligation for them to share their data with people who disagree with them regarding the claim that women in certain days of their monthly cycle are three times more likely to wear red or pink shirts. I’m not accusing them of scientific misconduct in not sharing their data. Maybe it was too much trouble for them to put their data online, maybe it is their judgment that science will proceed better without their data being available for all to see. Whatever. It’s their call.

I’m just agreeing with Rouder that data-on-request is not the same as open data. Not even close.

Stan workshops at UCLA (6/23) and UCI (6/24)

While Bob travels to Boston-ish, I’ll be giving two Stan workshops in Southern California. I’m excited to be back on the west coast for a few days — I grew up not too far away. Both workshops are open, but space is limited. Follow the links for registration.

The workshops will cover similar topics. I’m going to focus more on Stan usage and less on MCMC. If you’re attending, please install RStan 2.6.0 before the workshop.


P.S. Congrats, Dub Nation.

The David Brooks files: How many uncorrected mistakes does it take to be discredited?


OK, why am I writing this? We all know that New York Times columnist David Brooks deals in false statistics, he’s willing and able to get factual matters wrong, he doesn’t even fact-check his own reporting, his response when people point out his mistakes is irritation rather than thanks, he won’t run a correction even if the entire basis for one of his columns is destroyed, and he thinks that he thinks technical knowledge is like the recipes in a cookbook and can be learned by rote. A friend of facts, he’s not.

But we know all that. So I was not surprised when Adam Sales pointed me to this recent article by David Zweig, “The facts vs. David Brooks: Startling inaccuracies raise questions about his latest book.”

Unlike Zweig (or his headline writer), I was hardly startled that Brooks had inaccuracies. Accuracy ain’t Brooks’s game.

And Jonathan Falk pointed me to this review by Mark Liberman of many instances where Brooks got things wrong.

Amazingly enough, the errors pointed out by Sales and Liberman don’t even overlap with the errors that I’d noticed in some Brooks columns—the anti-Semitic education statistics and his completely wrong guess about the social backgrounds of rich people.

Anyway, this is all known, and my first response was a flippant, Yeah, no kidding, David Brooks is like Gregg Easterbrook without the talent.

Just to be clear: this is not meant as a backhand slam on Easterbrook, a columnist who, like Brooks, loves to quote statistics but can get them horribly wrong. Easterbrook is a good writer, a fun football columnist, and sparkles with ideas. He really does have talent.

So here’s my question

Anyway, to continue, here’s my question: How is it that Brooks, who has such a reputation for screwing things up, continues to occupy his high post in journalism? Where did he get his Isiah Thomas-like ability to keep bouncing back from adversity, his Ray Keene-like ability to violate the norms of journalistic ethics?

And it’s not just the New York Times. Here, for example, is a puff piece that appeared on NPR a couple months ago. The reporter didn’t get around to asking, Hey, David Brooks, what about those fake statistics you published??

What will it take for Brooks’s external reputation to catch up to his internal reputation? Lots of things have come out over the years and it hasn’t happened yet. But this new story that came in, maybe it will make a difference. Straw that broke the camel’s back and all that.

For example, that NPR story quoted Brooks quoting a statistic that, according to Zweig’s thorough investigation, got “nearly every detail” wrong. NPR reporters don’t like to be patsies, right? Publishing fake numbers in the NYT is one thing—heck, Brooks has columns to fill every week, he can’t be picky and choosy about his material. But promulgating this in other news outlets, that could annoy people.

And, once Brooks loses the constituency of his fellow journalists, what does he have left?

At that point, he’s Dennis Miller without the jokes.

Michael LaCour in 20 years

Screen Shot 2015-05-22 at 2.34.30 PM

In case you were wondering what “Bruno” Lacour will be doing a couple decades from now . . .

James Delaney pointed me to this CNN news article, “Connecticut’s strict gun law linked to large homicide drop” by Carina Storrs:

The rate of gun-related murders fell sharply in the 10 years after Connecticut implemented a law requiring people buying firearms to have a license, according to a study. . . . To assess the effect of this law, researchers identified states that had levels of gun-related homicide similar to Connecticut before 1995. These include Rhode Island, New Hampshire and Maryland. When the researchers compared these states to Connecticut between 1995 and 2005, they found the level of gun-related homicide in Connecticut dropped below that of comparable states.

Based on the rates in these comparable states, the researchers estimated Connecticut would have had 740 gun murders if the law had not been enacted. Instead, the state had 444, representing a 40% decrease.

Wow—40%, that’s a lot! And, indeed, Storrs has a quote on it:

“I did expect a reduction [but] 40% is probably a little higher than I would have guessed,” said Daniel Webster, director of the Johns Hopkins Center for Gun Policy and Research who led the study, which was published Friday in the American Journal of Public Health.

A legal expert named Daniel Webster, huh? And I guess it’s good they have someone named Storrs writing articles about Connecticut.

Anyway, that’s a funny quote from the leader of the study! Perhaps the reporter should push a bit, maybe ask something like: Do you really believe the effect is 40%?? Or do you think that 40% is an overestimate coming from the statistical significance filter and the garden of forking paths?

OK, this is all important stuff. But it’s not the subject of today’s post.

Here’s the deal. Storrs continues her article:

Ten states have laws similar to Connecticut’s, including background check requirements. It is hard to know what effect permit-to-purchase laws have without looking in these other states, said John R Lott Jr., president of the Crime Prevention Research Center, a gun rights advocate and columnist for Fox News. “If 10 states passed a law, eight could increase and two could fall, and how do I know that it was because of the gun law?” he said.

Wha??? John Lott? CNN can’t find any real expert to interview? Why not just follow up with a quote from Mary Rosh, endorsing Lott as “the best professor I ever had”???

For those of you who don’t remember, John Lott shares with Michael LaCour the distinction of having announced, with great publicity, controversial data from a survey that he said he conducted, but then for which he could supply no evidence of its existence. Damn! I hate when that happens. As I wrote last month, Lott represents a possible role model for LaCour in that he seems to continue to be employed in some capacity doing research of an advocacy nature. And, like LaCour, Lott never admitted to fabrication nor did he apologize. (I guess that last part makes sense: if there’s nothing to admit, there’s nothing to apologize for.)

Ok, just on the statistics for a moment, Lott’s argument is terrible. First, “Ten states have laws similar to Connecticut’s” is not so relevant, given that the causal identification comes from the change in the law, not the existence of the law. Indeed, Storrs gets a good quote dismissing Lott’s argument:

Although Webster said he would like to study the effect of gun laws in other states, that research is not practical. Most states passed meaningful gun laws, such as laws requiring background checks, long ago, “frankly before I was born,” and it would be hard to know how those laws were enforced back then, and how society responded to them, he explained. In addition, information from death certificates was less readily available from the Centers for Disease Control and Prevention before 1980, he said.

Second, Lott says, “If 10 states passed a law, eight could increase and two could fall.” But that’s just ridiculous. Why suppose that introducing this law, which the data indicated was associated with a drop in homicides, would lead to an increase in 8 states out of 10?

It’s not that the Webster et al. claims are airtight. I’ve already expressed my concern that the estimated effect is too high, also Storrs alludes to evidence from other states that send mixed messages. And Delaney has a point when he writes, “One concern about the construction of the synthetic control is Connecticut’s proximity to and interconnections with NYC, which experienced a dramatic decrease in overall homicides from 1177 in 1995 to 539 in 2005 (according to Wikipedia). Whereas, from what I can tell, homicide totals while decreasing across the nation during this period, happened to be closer to constant in this period in New Hampshire, Rhode Island, and Maryland.”

But Lott’s criticisms are uninspiring. Let’s hope that Bruno Lacour can do better in his future career as an advocate and pundit, and let’s hope that news outlets can do better when looking for a quote. I heard John Yoo is available. . . .

How tall is Kit Harrington? Stan wants to know.

We interrupt our regularly scheduled programming for a special announcement.

Screen Shot 2015-06-15 at 9.00.23 PM

Madeleine Davies writes: “Here are some photos of Kit Harington. Do you know how tall he is?”

I’m reminded, of course, of our discussion of the height of professional tall person Jon Lee Anderson:

Cata w Jon Lee Anderson

Full Bayes, please. I can’t promise publication on Gawker, but I’ll do my best.

Because there is no observable certainty other than the existence of thought

Someone who is teaching a college philosophy class writes:

We discussed Descartes’ Meditations on First Philosophy last week — specifically, concerning the existence of God — and I had students write down their best proof for God’s existence in one minute, independent of their beliefs. Attached is a particularly funny response:

Screen Shot 2015-04-12 at 5.30.39 PM

Another good one was the blank sheet of paper that a student handed in…

On deck this week

Mon: Because there is no observable certainty other than the existence of thought

Tues: Michael LaCour in 20 years

Wed: Born-open data

Thurs: You can crush us, you can bruise us, yes, even shoot us, but oh—not a pie chart!

Fri: In which a complete stranger offers me a bet

Sat: Statistics Be

Sun: “When more data steer us wrong: replications with the wrong dependent measure perpetuate erroneous conclusions”

Saturday’s entry is my favorite this week.

Wikipedia is the best


“It is not readily apparent whether Boo-Boo is a juvenile bear with a precocious intellect or simply an adult bear who is short of stature.”

The language of insignificance

Jonathan Falk points me to an amusing post by Matthew Hankins giving synonyms for “not statistically significant.” Hankins writes:

The following list is culled from peer-reviewed journal articles in which (a) the authors set themselves the threshold of 0.05 for significance, (b) failed to achieve that threshold value for p and (c) described it in such a way as to make it seem more interesting.

And here are some examples:

slightly significant (p=0.09)
sufficiently close to significance (p=0.07)
trending towards significance (p>0.15)
trending towards significant (p=0.099)
vaguely significant (p>0.2)
verged on being significant (p=0.11)
verging on significance (p=0.056)
weakly statistically significant (p=0.0557)
well-nigh significant (p=0.11)

Lots more at the link.

This is great, but I do disagree with one thing in the post, which is where Hankins writes: “if you do [play the significance testing game], the rules are simple: the result is either significant or it isn’t.”

I don’t like this; I think the idea that it’s a “game” with wins and losses is a big part of the problem! More on this point in our “power = .06″ post.

JuliaCon 2015 (24–27 June, Boston-ish)

JuliaCon is coming to Cambridge, MA the geek capital of the East Coast: 24–27 June. Here’s the conference site with program.

I (Bob) will be giving a 10 minute “lightning talk” on Stan.jl, the Julia interface to Stan (built by Rob J. Goedman — I’m just pinch hitting because Rob couldn’t make it).

The uptake of Julia has been nothing short of spectacular. I’m really looking forward to learning more about it.

Trivia tidbit: Julia and Stan go way back; they were both developed under the same U.S. Department of Energy grant for high-performance computing (DE-SC0002099).

Does your time as a parent make a difference?

A colleague writes:

Thought you might be interested in this front page data journalism take down of an article. I don’t know the article but this amounts to a journalist talking with someone who didn’t like the piece and ripping it based on a measurement detail. How bad though is this measurement detail? Are you better off having data on parental effort for 2 days on 30 parents or for 2 parents on 30 days? It’s not a cluster question since outcomes are at the parent level not the day level.

The New York Times article in question, by Justin Wolfers, says:

The latest salvo in the mommy wars is that all that time you spend parenting just doesn’t matter. But it’s a claim that, despite the enthusiastic and widespread coverage by news media . . . does not hold water. . . .

The claim that parenting time doesn’t matter . . . largely reflects the failure of the authors to accurately measure parental input. In particular, the study does not measure how much time parents typically spend with their children. Instead, it measures how much time each parent spends with children on only two particular days — one a weekday and the other a weekend day.

The result is that whether you are categorized as an intensive or a distant parent depends largely on which days of the week you happened to be surveyed. . . . Trying to get a sense of the time you spend parenting from a single day’s diary is a bit like trying to measure your income from a single day. If yesterday was payday, you look rich, but if it’s not, you would be reported as dead broke. You get a clearer picture only by looking at your income — or your parenting time — over a more meaningful period.

I took a quick look at the paper [an article by Melissa Milkie, Kei Nomaguchi, and Kathleen Denny, in the Journal of Marriage and Family] and I’m concerned that they controlled for “mother’s work hours.” This doesn’t seem quite right, because if the mother works fewer hours she can have more time to spend with kid, right? Anyway, the news article did seem a bit breathless.

My colleague replied:

I spent time puzzling over this today (without reading the original article yet). On the one hand it seems a little odd—like it’s sort of controlling for the treatment. But if (perhaps a big if) you think that working is pretreatment with respect to caring for children then you probably do want to control for this. It’s like staying at home is the encouragement, spending time with kids is a treatment, and the outcome is the outcome. Assume away selection issues and say that there was random assignment to staying at home and also random assignment to spending time with kids conditional on staying at home — ie with different propensities for those that stay at home or not. Then you would want to take the average of the spending time effect in each of the stay at home categories. You might worry that staing at home is hugely predictive of spending times and so when you condition on staying at home you have no variation in spending time with kids to use to estimate an effect. But that’s another way of saying that you cannot tell if the effect is due to staying at home or to spending time with kids. If there is an independent effect of staying home with kids it should be observed in both groups. If it is different in both groups then a linear control is not correct (given the probably different assignment propensities to each group); but that just means you have to be careful how you control not that you shouldn’t control.

As I’ve said before, one of the benefits of blogging is that it’s acceptable to retain some level of uncertainty. In a scientific paper or, in a different way, in a news article, there’s pressure to make a strong conclusion. Here we can discuss and leave some questions open.

Applied regression and multilevel modeling books using Stan

Edo Navot writes:

Are there any plans in the works to update your book with Prof. Hill on hierarchical models to a new edition with example code in Stan?

Yes, we are planning to break it up into 2 books and do all the modeling for both books in Stan. It’s waiting on some new functionality we’re building in Stan to do maximum likelihood, penalized maximum likelihood, and maximum marginal likelihood, and also to fit various standard models such as linear and logistic regression automatically.