Skip to content

“Why should anyone believe that? Why does it make sense to model a series of astronomical events as though they were spins of a roulette wheel in Vegas?”

Deborah Mayo points us to a post by Stephen Senn discussing various aspects of induction and statistics, including the famous example of estimating the probability the sun will rise tomorrow. Senn correctly slams a journalistic account of the math problem:

The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child’s degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.

[The above quote is not by Senn; it’s a quote of something he disagrees with!]

Canonical and wrong. X and I discuss this problem in section 3 of our article on the history of anti-Bayesianism (see also rejoinder to discussion here). We write:

The big, big problem with the Pr(sunrise tomorrow | sunrise in the past) argument is not in the prior but in the likelihood, which assumes a constant probability and independent events. Why should anyone believe that? Why does it make sense to model a series of astronomical events as though they were spins of a roulette wheel in Vegas? Why does stationarity apply to this series? That’s not frequentist, it is not Bayesian, it’s just dumb. Or, to put it more charitably, it is a plain vanilla default model that we should use only if we are ready to abandon it on the slightest pretext.

Strain at the gnat that is the prior and swallow the ungainly camel that is the iid likelihood. Senn’s discussion is good in that he keeps his eye on the ball knits his row straight without getting distracted by stray bits of yarn.

Humility needed in decision-making

Brian MacGillivray and Nick Pidgeon write:

Daniel Gilbert maintains that people generally make bad decisions on risk issues, and suggests that communication strategies and education programmes would help (Nature 474, 275–277; 2011). This version of the deficit model pervades policy-making and branches of the social sciences.

In this model, conflicts between expert and public perceptions of risk are put down to the difficulties that laypeople have in reasoning in the face of uncertainties rather than to deficits in knowledge per se.

Indeed, this is the “Nudge” story we hear a lot: the idea is that our well-known cognitive biases are messing us up, and policymakers should be accounting for this.

But MacGillivray and Pidgeon take a more Gigerenzian view:

There are three problems with this stance.

First, it relies on a selective reading of the literature. . . .

Second, it rests on some bold extrapolations. For example, it is not clear how the biases Gilbert identifies in the classic ‘trolley’ experiment play out in the real world. Many such reasoning ‘errors’ are mutually contradictory — for example, people have been accused of both excessive reliance on and neglect of generic ‘base-rate’ information to judge the probability of an event. This casts doubt on the idea that they reflect universal or hard-wired failings in cognition.

The third problem is the presentation of rational choice theory as the only way of deciding how to handle risk issues.

They conclude:

Given that many modern risk crises stem from science’s inability to foresee the dark side of technological progress, a little humility from the rationality project wouldn’t go amiss.

Recently in the sister blog

Where does Mister P draw the line?

Bill Harris writes:

Mr. P is pretty impressive, but I’m not sure how far to push him in particular and MLM [multilevel modeling] in general.

Mr. P and MLM certainly seem to do well with problems such as eight schools, radon, or the Xbox survey. In those cases, one can make reasonable claims that the performance of the eight schools (or the houses or the interviewees, conditional on modeling) are in some sense related.

Then there are totally unrelated settings. Say you’re estimating the effect of silicone spray on enabling your car to get you to work: fixing a squeaky door hinge, covering a bad check you paid against the car loan, and fixing a bald tire. There’s only one case where I can imagine any sort of causal or even correlative connection, and I’d likely need persuading to even consider trying to model the relationship between silicone spray and keeping the car from being repossessed.

If those two cases ring true, where does one draw the line between them? For a specific example, see “New drugs and clinical trial design in advanced sarcoma: have we made any progress?” (inked from here). The discussion covers rare but somewhat related diseases, and the challenge is to do clinical studies with sufficient power from number of participants in aggregate and by disease subtype.

Do you know if people have successfully used MLM or Mr. P in such settings? I’ve done some searching and not found anything I recognized.

I suspect that the real issue is understanding potential causal mechanisms, but MLM and perhaps Mr. P. sound intriguing for such cases. I’m thinking of trying fake data to test the idea.

I have a few quick thoughts here:

– First, on the technical question about what happens if you try to fit a hierarchical model to unrelated topics: if the topics are really unrelated, there should be no reason to expect the true underlying parameter values to be similar, hence the group-level variance will be estimated to be huge, hence essentially no pooling. The example I sometimes give is: suppose you’re estimating 8 parameters: the effects of SAT coaching in 7 schools, and the speed of light. These will be so different that you’re just getting the unpooled estimate. The unpooled estimate is not the best—you’d rather pool the 7 schools together—but it’s the best you can do given your model and your available information.

– To continue this a bit, suppose you are estimating 8 parameters: the effects of a fancy SAT coaching program in 4 schools, and the effects of a crappy SAT coaching program in 4 other schools. Then what you’d want to do is partially pool each group of 4 or, essentially equivalently, to fit a multilevel regression at the school level with a predictor indicating the prior assessment of quality of the coaching program. Without that information, you’re in a tough situation.

– Now consider your silicone spray example. Here you’re estimating unrelated things so you won’t get anything useful from partial pooling. Bayesian inference can still be helpful here, though, in that you should be able to write down informative priors for all your effects of interest. In my books I was too quick to use noninformative priors.

Hey, this is what Michael Lacour should’ve done when they asked him for his data

A note from John Lott

The other day, I wrote:

It’s been nearly 20 years since the last time there was a high-profile report of a social science survey that turned out to be undocumented. I’m referring to the case of John Lott, who said he did a survey on gun use in 1997, but, in the words of Wikipedia, “was unable to produce the data, or any records showing that the survey had been undertaken.” Lott, like LaCour nearly two decades later, mounted an aggressive, if not particularly convincing, defense.

Lott disputes what is written on the Wikipedia page. Here’s what he wrote to me, first on his background:

You probably don’t care, but your commentary is quite wrong about my career and the survey. Since most of the points that you raise are dealt with in the post below, I will just mention that you have the trajectory of my career quite wrong. My politically incorrect work had basically ended my academic career in 2001. After having had positions at Wharton, University of Chicago, and Yale, I was unable to get an academic job in 2001 and spent 5 months being unemployed before ending up at a think tank AEI. If you want an example of what had happened you can see here. A similar story occurred at Yale where some US Senators complained about my research. My career actual improved after that, at least if you judge it by getting academic appointments. For a while universities didn’t want to touch someone who would get these types of complaints from high profile politicians. I later re-entered academia, though eventually I got tired of all the political correctness and left academia.

Regarding the disputed survey, Lott points here and writes:

Your article gives no indication that the survey was replicated nor do you explain why the tax records and those who participated in the survey were not of value to you. Your comparison to Michael LaCour is also quite disingenuous. Compare our academic work. As I understand it, LaCour’s data went to the heart of his claim. In my case we are talking about one paragraph in my book and the survey data was biased against the claim that I was making (see the link above).

I have to admit I never know what to make of it when someone describes me as “disingenuous,” which according to the dictionary, means “not candid or sincere, typically by pretending that one knows less about something than one really does.” I feel like responding, truly, that I was being candid and sincere! But of course once someone accuses you of being insincere, it won’t work to respond in that way. So I can’t really do anything with that one.

Anyway, Lott followed up with some specific responses to the Wikipedia entry:

The Wikipedia statement . . . is completely false (“was unable to produce the data, or any records showing that the survey had been undertaken”). You can contact tax law Professor Joe Olson who went through my tax records. There were also people who have come forward to state that they took the survey.

A number of academics and others have tried to correct the false claims on Wikipedia but they have continually been prevented from doing so, even on obviously false statements. Here are some posts that a computer science professor put up about his experience trying to correct the record at Wikipedia.

I hope that you will correct the obviously false claim that I “was unable to produce the data, or any records showing that the survey had been undertaken.” Now possibly the people who wrote the Wikipedia post want to dismiss my tax records or the statements by those who say that they took the survey, but that is very different than them saying that I was unable to produce “any records.” As to the data, before the ruckus erupted over the data, I had already redone the survey and gotten similar results. There are statements from 10 academics who had contemporaneous knowledge of my hard disk crash where I lost the data for that and all my other projects and from academics who worked with me to replace the various data sets that were lost.

I don’t really have anything to add here. With LaCour there was a pile of raw data and also a collaborator, Don Green, who recommended to the journal that their joint paper be withdrawn. The Lott case happened two decades ago, there’s no data file and no collaborator, so any evidence is indirect. In any case, I thought it only fair to share Lott’s words on the topic.

Introducing StataStan

stan logo

Thanks to Robert Grant, we now have a Stata interface! For more details, see:

Jonah and Ben have already kicked the tires, and it works. We’ll be working on it more as time goes on as part of our Institute of Education Sciences grant (turns out education researchers use a lot of Stata).

We welcome feedback, either on the Stan users list or on Robert’s blog post. Please don’t leave comments about StataStan here — I don’t want to either close comments for this post or hijack Robert’s traffic.

Thanks, Robert!

P.S. Yes, we know that Stata released its own Bayesian analysis package, which even provides a way to program your own Bayesian models. Their language doesn’t look very flexible, and the MCMC sampler is based on Metropolis and Gibbs, so we’re not too worried about the competition on hard problems.

God is in every leaf of every probability puzzle

Radford shared with us this probability puzzle of his from 1999:

A couple you’ve just met invite you over to dinner, saying “come by around 5pm, and we can talk for a while before our three kids come home from school at 6pm”.

You arrive at the appointed time, and are invited into the house. Walking down the hall, your host points to three closed doors and says, “those are the kids’ bedrooms”. You stumble a bit when passing one of these doors, and accidently push the door open. There you see a dresser with a jewelry box, and a bed on which a dress has been laid out. “Ah”, you think to yourself, “I see that at least one of their three kids is a girl”.

Your hosts sit you down in the kitchen, and leave you there while they go off to get goodies from the stores in the basement. While they’re away, you notice a letter from the principal of the local school tacked up on the refrigerator. “Dear Parent”, it begins, “Each year at this time, I write to all parents, such as yourself, who have a boy or boys in the school, asking you to volunteer your time to help the boys’ hockey team…” “Umm”, you think, “I see that they have at least one boy as well”.

That, of course, leaves only two possibilities: Either they have two boys and one girl, or two girls and one boy. What are the probabilities of these two possibilities?

NOTE: This isn’t a trick puzzle. You should assume all things that it seems you’re meant to assume, and not assume things that you aren’t told to assume. If things can easily be imagined in either of two ways, you should assume that they are equally likely. For example, you may be able to imagine a reason that a family with two boys and a girl would be more likely to have invited you to dinner than one with two girls and a boy. If so, this would affect the probabilities of the two possibilities. But if your imagination is that good, you can probably imagine the opposite as well. You should assume that any such extra information not mentioned in the story is not available.

As a commenter pointed out, there’s something weird about how the puzzle is written, not just the charmingly retro sex roles but also various irrelevant details such as the time of the dinner. (Although I can see why Radford wrote it that way, as it was a way to reveal the number of kids in a natural context.)

The solution at first seems pretty obvious: As Radford says, the two possibilities are:
(a) 2 boys and 1 girl, or
(b) 1 boy and 2 girls.
If it’s possibility (a), the probability of the random bedroom being a girl’s is 1/3, and the probability of getting that note (“I write to all parents . . . who have a boy or boys at the school”) is 1, so the probability of the data is 1/3.
If it’s possibility (b), the probability of the random bedroom being a girl’s is 2/3, and the probability of getting the school note is still 1, so the probability of the data is 2/3.
The likelihood ratio is thus 2:1 in favor of possibility (b).

Case closed . . . but is it?

Two complications arise. First, as commenter J. Cross pointed out, if the kids go to multiple schools, it’s not clear what is the probability of getting that note, but a first guess would be that the probability of you seeing such a note on the fridge is proportional to the number of boys in the family. Actually, even if there’s only one school the kids go to, it might be more likely to see the note prominently on the fridge if there are 2 boys: presumably, the probability that at least one boy is interested in hockey is an higher if there are two boys than if there’s only one.

The other complication is the prior odds. Pr(boy birth) is about .512, so the prior odds, which are .512/.488 in favor of the 2 boys and 1 girl, rather than 2 girls and 1 boy.

This is just to demonstrate that, as Feynman could’ve said in one of his mellower moments, God is in every leaf of every tree: Just about every problem is worth looking at carefully. It’s the fractal nature of reality.

On deck this week

Mon: God is in every leaf of every probability puzzle

Tues: Where does Mister P draw the line?

Wed: Recently in the sister blog

Thurs: Humility needed in decision-making

Fri: “Why should anyone believe that? Why does it make sense to model a series of astronomical events as though they were spins of a roulette wheel in Vegas?”

Sat: July 4th

Sun: “Menstrual Cycle Phase Does Not Predict Political Conservatism”

What’s So Fun About Fake Data?

Our first Daily Beast column is here.

Interpreting posterior probabilities in the context of weakly informative priors

Nathan Lemoine writes:

I’m an ecologist, and I typically work with small sample sizes from field experiments, which have highly variable data. I analyze almost all of my data now using hierarchical models, but I’ve been wondering about my interpretation of the posterior distributions. I’ve read your blog, several of your papers (Gelman and Weakliem, Gelman and Carlin), and your excellent BDA book, and I was wondering if I could ask your advice/opinion on my interpretation of posterior probabilities.

I’ve thought of 95% posterior credible intervals as a good way to estimate effect size, but I still see many researchers use them in something akin to null hypothesis testing: “The 95% interval included zero, and therefore the pattern was not significant”. I tend not to do that. Since I work with small sample sizes and variable data, it seems as though I’m unlikely to find a “significant effect” unless I’m vastly overestimating the true effect size (Type M error) or unless the true effect size is enormous (a rarity). More often than not, I find ‘suggestive’, but not ‘significant’ effects.

In such cases, I calculate one-tailed posterior probabilities that the effect is positive (or negative) and report that along with estimates of the effect size. For example, I might say something like

“Foliar damage tended to be slightly higher in ‘Ambient’ treatments, although the difference between treatments was small and variable (Pr(Ambient>Warmed) = 0.86, CI95 = 2.3% less – 6.9% more damage).”

By giving the probability of an effect as well as an estimate of the effect size, I find this to be more informative than simply saying ‘not significant’. This allows researchers to make their own judgements on importance, rather than defining importance for them by p < 0.05. I know that such one-tailed probabilities can be inaccurate when using flat priors, but I place weakly informative priors ( N(0,1) or N(0,2) ) on all parameters in an attempt to avoid such overestimates unless strongly supported by my small sample sizes. I was wondering if you agree with this philosophy of data reporting and interpretation, or if I’m misusing the posterior probabilities. I’ve done some research on this, but I can’t find anyone that’s offered a solid opinion on this. Based on my reading and the few interactions I’ve had with others, it seems that the strength of posterior probabilities compared to p-values is that they allow for such fluid interpretation (what’s the probability the effect is positive? what’s the probability the effect > 5? etc.), whereas p-values simply tell you “if the null hypothesis is true, theres a 70 or 80% chance I could observe an effect as strong as mine by chance alone”. I prefer to give the probability of an effect bounded by the CI of the effect to give the most transparent interpretation possible.

My reply:

My short answer is that this is addressed in this post:

If you believe your prior, then yes, it makes sense to report posterior probabilities as you do. Typically, though, we use flat priors even though we have pretty strong knowledge that parameters are close to 0 (this is consistent with the fact that we see lots of estimates that are 1 or 2 se’s from 0, but very few that are 4 or 6 se’s from 0). So, really, if you want to make such a statement I think you’d want a more informative prior that shrinks to 0. If, for whatever reason, you don’t want to assign such a prior, then you have to be a bit more careful about interpreting those posterior probabilities.

In your case, you’re using weakly-informative priors such as N(0,1), this is less of a concern. Ultimately I guess the way to go is to embed any problem in a hierarchical meta-analysis so that the prior makes sense in the context of the problem. But, yeah, I’ve been using N(0,1) a lot myself lately.

“Faith means belief in something concerning which doubt is theoretically possible.” — William James (again)

Eric Tassone writes:

So, here’s a Bill James profile from late-ish 2014 that I’d missed until now. It’s baseball focused, which was nice — so many recent articles about him are non-baseball stuff. Here’s an extended excerpt of a part I found refreshing, though it’s probably just that my expectations have gotten pretty low of late w/r/t articles about him. What is going on in this passage? … an evolving maturity for him? … merely exchanging one set of biases for another?

Anyway, surprisingly I enjoyed the article. I hope you enjoy it too. Here’s an excerpt:

But [James] wonders if the generation of baseball fans he inspired have expanded their skepticism to the point where it has crowded out other things like wonder and tolerance and a healthy understanding of our own limited understanding.

Right now, Bill James thinks this sort of arrogance can be dangerous in the sabermetric community. There is more baseball data available now than ever before, and the data grows exponentially. “Understanding cannot keep up with the data,” he says. “It will take many years before we fully understand, say, some of the effects of PITCHf/x (which charts every pitch thrown). It’s important not to skip steps.”

He groans whenever he hears people discount leadership or team chemistry or heart because they cannot find such things in the data. He has done this himself in the past … and regrets it.

“I have to take my share of responsibility for promoting skepticism about things that I didn’t understand as well as I might have,” he says. “What I would say NOW is that skepticism should be directed at things that are actually untrue rather than things that are difficult to measure.

“Leadership is one player having an effect on his teammates. There is nothing about that that should invite skepticism. People have an effect on one another in every area of life. … We all affect another’s work. You just can’t really measure that in an individual-accounting framework.”

The young Bill James rather famously wrote that he could not find any evidence that certain types of players could consistently hit better in the clutch – he still has not found that evidence. But unlike his younger self, he will not dismiss the idea of clutch hitting. He has been a consultant for the Red Sox for more than a decade, and he has watched David Ortiz deliver so many big hits in so many big moments, and he finds himself unwilling to deny that Big Papi does have an ability in those situations others don’t have. He wrote an essay with this thought in mind, suggesting that just because we have not found the evidence is not a convincing argument that the evidence does not exist.

“I think I had limited understanding of these issues and wrote about them — little understanding and too-strong opinions,” he says. “And I think I damaged the discussion in some ways when I did this. … these sorts of effects (leadership and clutch-hitting and how players interact) CAN be studied. You just need to approach the question itself, rather than trying to back into it beginning with the answer.”

I responded: Interesting . . . but I wonder if part of this is that James is such an insider now that he’s buying into all the insider tropes.

Tassone replies:

Yep, exactly . . . especially since one is about his guy, Ortiz!

Sam Smith sings like a dream but he’s as clueless as Nicholas Wade when it comes to genetics

Psychologists speak of “folk psychology” or “folk physics” as the intuitive notions we have about the world, which typically describe some aspects of reality but ultimately are gross oversimplifications.

I encountered a good example of “folk genetics” the other day after following the clickbait link to “22 Things We Learned Hanging Out With Sam Smith”:

1. He wasn’t afraid to speak up for equality at his Catholic school.

“From what I can remember, they believe that you can be homosexual, but you just can’t practice it, which is ridiculous,” he says. “I would just say, ‘I am proof that it’s genetic. It has to be, because it wasn’t a choice.’ And that’s it. That’s my only argument, you know? You love who you love, and I can’t help that I like guys.”

The fallacy, of course, is to think that everything is a choice or is genetic. Just google *identical twins one straight one gay* if you want to learn more. I followed these links myself and found that genetic essentialism on this topic is not restricted the pro-gay side. From the other direction is the homophobic “” that announces “How is it possible that identical twins with identical DNA have different sexual attractions? Simple. No one is born gay. . . . if people are born homosexual, then all identical twin pairs should have identical sexual attractions, 100% of the time . . .” Nope nope nope. A lot can happen between conception and birth!

Anyway, this all gives me some sympathy for the ill-fated genetic essentialism of Nicholas Wade. If a Grammy-winning artist and the savants at can get this wrong, what hope is there for a New York Times science writer?

P.S. It’s still OK to like Sam Smith, right? I think he can’t really be overexposed until his second album comes out. The backlash starts then.

P.P.S. I originally titled this post, “An amusing window into folk genetics.” How boring can you get??? So I retitled as above. Like John Updike, I have difficulty coming up with good titles.

Our new column in the Daily Beast

Kaiser Fung and I have a new weekly column for the Daily Beast. After much deliberation, we gave it the title Statbusters (the runner-up choice was Dirty Data; my personal preference was Statboyz in the Hood, but, hey, who ever listens to me on anything?).

The column will appear every Saturday, and Kaiser and I are planning to alternate. Should be fun. The first column will go online this Saturday.

P.S. It didn’t go online until Sunday. It’s here.

When the counterintuitive becomes the norm, arguments get twisted out of shape

I was bothered by a recent post on the sister blog. The post was by political scientist David Fortunato and it was called, Would “concealed carry” have stopped Dylann Roof’s church shooting spree?.

What bugged me in particular was this sentence:

On its face, the claim that increasing the number of gun carriers would reduce crime seems logical (at least to an economist). If more people carry guns, then criminals would understand that the likelihood of their victims defending themselves with a gun is higher and would therefore be less likely to commit crime. In simple economic terms, easing concealed carry seeks to increase the cost assailants pay to commit a crime, so they choose not to, we hope.

On its face, I think the above argument is ridiculous in its naive division of the population into “criminals” and “their victims,” with no mention of the idea that the presence of the gun could lead to escalation. Also the silly idea that “a criminal” decides whether to “commit a crime,” with no sense that events develop in unexpected ways. The above argument is “simple” indeed, but I think it does no service to economists to consider it as logical.

This sort of thing bothers me about a lot of op-ed style writing, that the author has a certain flow in mind (in this case, it seems that Fortunato wanted to say that evidence on gun control is weak, and so he decided to lead up to it in this way). Also a problem with social science that everything has to be a “puzzle.” Fortunato writes:

But scientific research on the ability of concealed carry to reduce crime has yielded mixed results. A few studies suggest these policies are effective, but even more suggest that making concealed carry easier does not reduce crime and may even increase instances of firearm injury. Why is this the case?

Why indeed? It hardly seems a mystery that if more people are carrying guns, then they might use them accidentally or on purpose to shoot people.

Again I see a problem with Fortunato’ theoretical framework when he writes: “More people choose to carry firearms . . . Criminals observe (or infer) that more people are carrying firearms,” as if there is some division between “people” and “criminals.”

I don’t mean this as a slam on Fortunato’s research, and I’m not saying that the obvious answers to his question tells the whole story. Of course I have a huge respect for research on topics that laypeople consider to be obvious. But his post really bothers me as representing some of the unfortunate tendencies of journalism and social science to take a counterintuitive idea (giving out guns is a way to reduce gun crime!) and normalizing it so much as to make it the default explanation.

From the Monkey Cage perspective, I don’t think much can be done here—in the old days I would’ve done a follow-up post but I know we don’t really do these anymore—but maybe we can all be aware of the pitfalls of taking a counterintuitive framework as the norm. I think the content of Fortunato’s post is excellent but I’m disturbed by how it’s framed, as if it’s some sort of surprise that a wacky counterintuitive bank-shot theory of the world doesn’t actually seem to be borne out by the data.

P.S. I elaborate here in comments:

When someone focuses on an indirect effect instead of the first-order effect, that’s what I call a counterintuitive argument. Indeed, this might be the core of many or even all counterintuitive arguments: The first-order effect is obviously in one direction (increase the number of guns, make it more normal to carry guns as a way of life, and people like Dylann Roof are more likely to go shoot people), but there’s this second-order effect (Dylann Roof might not shoot if he thinks the people in the church are armed too).

The counterintuitive claim is that the second-order effect outweighs the first-order effect. It’s counterintuitive for the same general reason that an echo is typically not as loud as the original sound, or for the same general reason that elasticities are typically less than 1.

Some counterintuitive claims are true. But it hardly seems puzzling when a bank-shot, counterintuitive claim is not supported by the data.



Logo design by Michael Betancourt and Stephanie Mannheim.

P.S. Some commenters suggested the top of the S above is too large, but I wonder if that’s just because I’ve posted the logo in a large format. On the screen it would typically be smaller, something like this, which appears a bit more tasteful:


A question about race based stratification


Can Candan writes:

I have scraped horse racing data from a web site in Turkey and would like to try some models for predicting the finishing positions of future races, what models would you suggest for that?

There is one recent paper on the subject that seems promising, which claims to change the SMO algorithm of support vector regression to work with race based stratification, but no details given, I don’t understand what to modify with SMO algorithm.

This builds on the above one and improves with NDCG based model tuning of least squares SVR.

There’s a conditional logistic regression approach which I tried to implement, but I couldn’t get the claimed improvement over the public odds of winning, may be I’m doing something wrong here.

I’m quite comfortable with R any books, pointers, code snippets are greatly appreciated.

My reply: Sorry, this one is too far away from my areas of expertise!

Hey, what’s up with that x-axis??

Screen Shot 2015-04-20 at 1.17.52 PM

CDC should know better.

P.S. In comments, Zachary David supplies this correctly-scaled version:


It would be better to label the lines directly than to use a legend, and the y-axis is off by a factor of 100, but I can hardly complain given that he just whipped this graph up for us.

The real point is that, once the x-axis is scaled correctly, the shapes of the curves change! So that original graph really was misleading, in that it incorrectly implies a ramping up in the 3-10 year range.

P.P.S. Zachary David sent me an improved version:


Ideally the line labels would be colored so there’d be no need for the legend at all, but at this point I really shouldn’t be complaining.

On deck this week

Mon: Hey, what’s up with that x-axis??

Tues: A question about race based stratification

Wed: Our new column in the Daily Beast

Thurs: Irwin Shaw: “I might mistrust intellectuals, but I’d mistrust nonintellectuals even more.”

Fri: An amusing window into folk genetics

Sat: “Faith means belief in something concerning which doubt is theoretically possible.” — William James (again)

Sun: Interpreting posterior probabilities in the context of weakly informative priors

“When more data steer us wrong: replications with the wrong dependent measure perpetuate erroneous conclusions”

Evan Heit sent in this article with Caren Rotello and Chad Dubé:

There is a replication crisis in science, to which psychological research has not been immune: Many effects have proven uncomfortably difficult to reproduce. Although the reliability of data is a serious concern, we argue that there is a deeper and more insidious problem in the field: the persistent and dramatic misinterpretation of empirical results that replicate easily and consistently. Using a series of four highly studied “textbook” examples from different research domains (eyewitness memory, deductive reasoning, social psychology, and child welfare), we show how simple unrecognized incompatibilities among dependent measures, analysis tools, and the properties of data can lead to fundamental interpretive errors. These errors, which are not reduced by additional data collection, may lead to misguided research efforts and policy recommendations. We conclude with a set of recommended strategies and research tools to reduce the probability of these persistent and largely unrecognized errors. The use of receiver operating characteristic (ROC) curves is highlighted as one such recommendation.

I haven’t had a chance to look at this but it seems like it could be relevant to some of our discussion. Just speaking generally, I like their focus on measurement.