Skip to content

Which of these classes should he take?

Jake Humphries writes:

I for many years wanted to pursue medicine but after recently completing a master of public health, I caught the statistics bug. I need to complete the usual minimum prerequisites for graduate study in statistics (calculus through multivariable calculus plus linear algebra) but want to take additional math courses as highly competitive stats and biostats programs either require or highly recommend more than the minimum. I could of course end up earning a whole other bachelor degree in math if I tried to take all the recommended courses. Could you please rank the following courses according to importance/practical utility in working in statistics and in applying for a competitive stats PhD program? This would greatly assist me in prioritising which courses to complete.

1. Mathematical modeling
2. Real analysis
3. Complex analysis
4. Numerical analysis

My quick advice:

– “Mathematical modeling”: I don’t know what’s in this class. But, from the title, it seems very relevant to statistics.

– “Real analysis”: Not so relevant to real-world statistics but important for PhD applications because it’s a way to demonstrate that you understand math. And understanding math _is_ important to real-world statistics. Thus, the point of a “real analysis” class for a statistician is not so much that you learn real analysis, which is pretty irrelevant for most things, but that it demonstrates that you can do real analysis.

– “Complex analysis”: A fun topic but you’ll probably never ever need it, so no need to take this one.

– “Numerical analysis”: I don’t know what’s in this class. You could take it but it’s not really necessary.

“The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong.”

We interrupt our usual programming of mockery of buffoons to discuss a bit of statistical theory . . .

Continuing from yesterday‘s quotation of my 2012 article in Epidemiology:

Like many Bayesians, I have often represented classical confidence intervals as posterior probability intervals and interpreted one-sided p-values as the posterior probability of a positive effect. These are valid conditional on the assumed noninformative prior but typically do not make sense as unconditional probability statements.

The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong. At first this may sound paradoxical, that a noninformative or weakly informative prior yields posteriors that are too forceful—and let me deepen the paradox by stating that a stronger, more informative prior will tend to yield weaker, more plausible posterior statements.

How can it be that adding prior information weakens the posterior? It has to do with the sort of probability statements we are often interested in making. Here is an example from Gelman and Weakliem (2009). A sociologist examining a publicly available survey discovered a pattern relating attractiveness of parents to the sexes of their children. He found that 56% of the children of the most attractive parents were girls, compared to 48% of the children of the other parents, and the difference was statistically significant at p<0.02. The assessments of attractiveness had been performed many years before these people had children, so the researcher felt he had support for a claim of an underlying biological connection between attractiveness and sex ratio.

The original analysis by Kanazawa (2007) had multiple comparisons issues, and after performing a regression rather than selecting the most significant comparison, we get a p-value closer to 0.2 rather than the stated 0.02. For the purposes of our present discussion, though, in which we are evaluating the connection between p-values and posterior probabilities, it will not matter much which number we use. We shall go with p=0.2 because it seems like a more reasonable analysis given the data.

Let θ be the true (population) difference in sex ratios of attractive and less attractive parents. Then the data under discussion (with a two-sided p-value of 0.2), combined with a uniform prior on θ, yields a 90% posterior probability that θ is positive. Do I believe this? No. Do I even consider this a reasonable data summary? No again. We can derive these No responses in three different ways, first by looking directly at the evidence, second by considering the prior, and third by considering the implications for statistical practice if this sort of probability statement were computed routinely.

First off, a claimed 90% probability that θ>0 seems too strong. Given that the p-value (adjusted for multiple comparisons) was only 0.2—that is, a result that strong would occur a full 20% of the time just by chance alone, even with no true difference—it seems absurd to assign a 90% belief to the conclusion. I am not prepared to offer 9 to 1 odds on the basis of a pattern someone happened to see that could plausibly have occurred by chance, nor for that matter would I offer 99 to 1 odds based on the original claim of the 2% significance level.

Second, the prior uniform distribution on θ seems much too weak. There is a large literature on sex ratios, with factors such as ethnicity, maternal age, and season of birth corresponding to difference in probability of girl birth of less than 0.5 percentage points. It is a priori implausible that sex-ratio differences corresponding to attractiveness are larger than for these other factors. Assigning an informative prior centered on zero shrinks the posterior toward zero, and the resulting posterior probability that θ>0 moves to a more plausible value in the range of 60%, corresponding to the idea that the result is suggestive but not close to convincing.

Third, consider what would happen if we routinely interpreted one-sided p-values as posterior probabilities. In that case, an experimental result that is 1 standard error from zero—that is, exactly what one might expect from chance alone—would imply an 83% posterior probability that the true effect in the population has the same direction as the observed pattern in the data at hand. It does not make sense to me to claim 83% certainty—5 to 1 odds—based on data that not only could occur by chance but in fact represent an expected level of discrepancy. This system-level analysis accords with my criticism of the flat prior: as Greenland and Poole note in their article, the effects being studied in epidemiology are typically range from -1 to 1 on the logit scale, hence analyses assuming broader priors will systematically overstate the probabilities of very large effects and will overstate the probability that an estimate from a small sample will agree in sign with the corresponding population quantity.

Rather than relying on noninformative priors, I prefer the suggestion of Greenland and Poole to bound posterior probabilities using real prior information.

OK, I did discuss some buffoonish research here. But, look, no mockery! I was using the silly stuff as a lever to better understand some statistical principles. And that’s ok.

There are 6 ways to get rejected from PLOS: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, (5) chapter 11 bankruptcy proceedings, and (6) having no male co-authors

This story is pretty horrifying/funny. But the strangest thing was this part:

[The author] and her colleague have appealed to the unnamed journal, which belongs to the PLoS family . . .

I thought PLOS published just about everything! This is not a slam on PLOS. Arxiv publishes everything too, and Arxiv is great.

The funny thing is, I do think there are cases where having both male and female coauthors gives a paper more credibility, sometimes undeserved. For example, if you take a look at those papers on ovulation and voting, and ovulation and clothing, and fat arms and political attitudes, you’ll see these papers have authors of both sexes, which insulates them from the immediate laugh-them-out-of-the-room reaction that they might get were they written by men only. Having authors of both sexes does not of course exempt them from direct criticisms of the work; I just think that a paper on “that time of the month” written by men would, for better or worse, get a more careful review.

P.S. Also, one thing I missed in my first read of this story: the reviewer wrote:

Perhaps it is not so surprising that on average male doctoral students co-author one more paper than female doctoral students, just as, on average, male doctoral students can probably run a mile race a bit faster than female doctoral students . . . And it might well be that on average men publish in better journals . . . perhaps simply because men, perhaps, on average work more hours per week than women, due to marginally better health and stamina.

“Marginally better health and stamina”—that’s a laff and a half! Obviously this reviewer is no actuary and doesn’t realize that men die at a higher rate than women at every age.

On the plus side, it’s pretty cool that James Watson is still reviewing journal articles, giving something back to the community even in retirement. Good on ya, Jim! Don’t let the haters get you down.

Good, mediocre, and bad p-values

From my 2012 article in Epidemiology:

In theory the p-value is a continuous measure of evidence, but in practice it is typically trichotomized approximately into strong evidence, weak evidence, and no evidence (these can also be labeled highly significant, marginally significant, and not statistically significant at conventional levels), with cutoffs roughly at p=0.01 and 0.10.

One big practical problem with p-values is that they cannot easily be compared. The difference between a highly significant p-value and a clearly non-significant p-value is itself not necessarily statistically significant. . . . Consider a simple example of two independent experiments with estimates ± standard error of 25 ± 10 and 10 ± 10. The first experiment is highly statistically significant (two and a half standard errors away from zero, corresponding to a (normal-theory) p-value of about 0.01) while the second is not significant at all. Most disturbingly here, the difference is 15 ± 14, which is not close to significant . . .

In short, the p-value is itself a statistic and can be a noisy measure of evidence. This is a problem not just with p-values but with any mathematically equivalent procedure, such as summarizing results by whether the 95% confidence interval includes zero.

Good, mediocre, and bad p-values

For all their problems, p-values sometimes “work” to convey an important aspect of the relation of data to model. Other times a p-value sends a reasonable message but does not add anything beyond a simple confidence interval. In yet other situations, a p-value can actively mislead. Before going on, I will give examples of each of these three scenarios.

A p-value that worked. Several years ago I was contacted by a person who suspected fraud in a local election (Gelman, 2004). Partial counts had been released throughout the voting process and he thought the proportions for the different candidates looked suspiciously stable, as if they had been rigged ahead of time to aim for a particular result. Excited to possibly be at the center of an explosive news story, I took a look at the data right away. After some preliminary graphs—which indeed showed stability of the vote proportions as they evolved during election day—I set up a hypothesis test comparing the variation in the data to what would be expected from independent binomial sampling. When applied to the entire dataset (27 candidates running for six offices), the result was not statistically significant: there was no less (and, in fact, no more) variance than would be expected by chance. In addition, an analysis of the 27 separate chi-squared statistics revealed no particular patterns. I was left to conclude that the election results were consistent with random voting (even though, in reality, voting was certainly not random—for example, married couples are likely to vote at the same time, and the sorts of people who vote in the middle of the day will differ from those who cast their ballots in the early morning or evening), and I regretfully told my correspondent that he had no case.

In this example, we did not interpret a non-significant result as a claim that the null hypothesis was true or even as a claimed probability of its truth. Rather, non-significance revealed the data to be compatible with the null hypothesis; thus, my correspondent could not argue that the data indicated fraud.

A p-value that was reasonable but unnecessary. It is common for a research project to culminate in the estimation of one or two parameters, with publication turning on a p-value being less than a conventional level of significance. For example, in our study of the effects of redistricting in state legislatures (Gelman and King, 1994), the key parameters were interactions in regression models for partisan bias and electoral responsiveness. Although we did not actually report p-values, we could have: what made our paper complete was that our findings of interest were more than two standard errors from zero, thus reaching the p<0.05 level. Had our significance level been much greater (for example, estimates that were four or more standard errors from zero), we would doubtless have broken up our analysis (for example, separately studying Democrats and Republicans) in order to broaden the set of claims that we could confidently assert. Conversely, had our regressions not reached statistical significance at the conventional level, we would have performed some sort of pooling or constraining of our model in order to arrive at some weaker assertion that reached the 5% level. (Just to be clear: we are not saying that we would have performed data dredging, fishing for significance; rather, we accept that sample size dictates how much we can learn with confidence; when data are weaker, it can be possible to find reliable patterns by averaging.)

In any case, my point here is that in this example it would have been just fine to summarize our results in this example via p-values even though we did not happen to use that formulation.

A misleading p-value. Finally, in many scenarios p-values can distract or even mislead, either a non-significant result wrongly interpreted as a confidence statement in support of the null hypothesis, or a significant p-value that is taken as proof of an effect. A notorious example of the latter is the recent paper of Bem (2011), which reported statistically significant results from several experiments on ESP. At brief glance, it seems impressive to see multiple independent findings that are statistically significant (and combining the p-values using classical rules would yield an even stronger result), but with enough effort it is possible to find statistical significance anywhere (see Simmons, Nelson, and Simonsohn, 2011).

The focus on p-values seems to have both weakened the study (by encouraging the researcher to present only some of his data so as to draw attention away from non-significant results) and to have led reviewers to inappropriately view a low p-value (indicating a misfit of the null hypothesis to data) as strong evidence in favor of a specific alternative hypothesis (ESP) rather than other, perhaps more scientifically plausible alternatives such as measurement error and selection bias.

I’ve written on these issues in many other places but the questions keep coming up so I thought it was worth reposting.

Tomorrow I’ll highlight another part of this article, this time dealing with Bayesian inference.

Carl Morris: Man Out of Time [reflections on empirical Bayes]

I wrote the following for the occasion of his recent retirement party but I thought these thoughts might of general interest:

When Carl Morris came to our department in 1989, I and my fellow students were so excited. We all took his class. The funny thing is, though, the late 1980s might well have been the worst time to be Carl Morris, from the standpoint of what was being done in statistics at that time—not just at Harvard, but in the field in general. Carl has made great contributions to statistical theory and practice, developing ideas which have become particularly important in statistics in the last two decades. In 1989, though, Carl’s research was not in the mainstream of statistics, or even of Bayesian statistics.

When Carl arrived to teach us at Harvard, he was both a throwback and ahead of his time.

Let me explain. Two central aspects of Carl’s research are the choice of probability distribution for hierarchical models, and frequency evaluations in hierarchical settings where both Bayesian calibration (conditional on inferences) and classical bias and variance (conditional on unknown parameter values) are relevant. In Carl’s terms, these are “NEF-QVF” and “empirical Bayes.” My point is: both of these areas were hot at the beginning of Carl’s career and they are hot now, but somewhere in the 1980s they languished.

In the wake of Charles Stein’s work on admissibility in the late 1950s there was an interest, first theoretical but with clear practical motivations, to produce lower-risk estimates, to get the benefits of partial pooling while maintaining good statistical properties conditional on the true parameter values, to produce the Bayesian omelet without cracking the eggs, so to speak. In this work, the functional form of the hierarchical distribution plays an important role—and in a different way than had been considered in statistics up to that point. In classical distribution theory, distributions are typically motivated by convolution properties (for example, the sum of two gamma distributions with a common shape parameter is itself gamma), or by stable laws such as the central limit theorem, or by some combination or transformation of existing distributions. But in Carl’s work, the choice of distribution for a hierarchical model can be motivated based on the properties of the resulting partially pooled estimates. In this way, Carl’s ideas are truly non-Bayesian because he is considering the distribution of the parameters in a hierarchical model not as a representation of prior belief about the set of unknowns, and not as a model for a population of parameters, but as a device to obtain good estimates.

So, using a Bayesian structure to get good classical estimates. Or, Carl might say, using classical principles to get better Bayesian estimates. I don’t know that they used the term “robust” in the 1950s and 1960s, but that’s how we could think of it now.

The interesting thing is, if we take Carl’s work seriously (and we should), we now have two principles for choosing a hierarchical model. In the absence of prior information about the functional form of the distribution of group-level parameters, and in the absence of prior information about the values of the hyperparameters that would underly such a model, we should use some form with good statistical properties. On the other hand, if we do have good prior information, we should of course use it—even R. A. Fisher accepted Bayesian methods in those settings where the prior distribution is known.

But, then, what do we do in those cases in between—the sorts of problems that arose in Carl’s applied work in health policy and other areas? I learned from Carl to use our prior information to structure the model, for example to pick regression coefficients, to decide which groups to pool together, to decide which parameters to model as varying, and then use robust hierarchical modeling to handle the remaining, unexplained variation. This general strategy wasn’t always so clear in the theoretical papers on empirical Bayes, but it came through in the Carl’s applied work, as well as that of Art Dempster, Don Rubin, and others, much of which flowered in the late 1970s—not coincidentally, a few years after Carl’s classic articles with Brad Efron that put hierarchical modeling on a firm foundation that connected with the edifice of theoretical statistics, gradually transforming these ideas from a parlor trick into a way of life.

In a famous paper, Efron and Morris wrote of “Stein’s paradox in statistics,” but as a wise man once said, once something is understood, it is no longer a paradox. In un-paradoxing shrinkage estimation, Efron and Morris finished the job that Gauss, Laplace, and Galton had begun.

So far, so good. We’ve hit the 1950s, the 1960s, and the 1970s. But what happened next? Why do I say that, as of 1989, Carl’s work was “out of time”? The simplest answer would be that these ideas were a victim of their own success: once understood, no longer mysterious. But it was more than that. Carl’s specific research contribution was not just hierarchical modeling but the particular intricacies involved in the combination of data distribution and group-level model. His advice was not simply “do Bayes” or even “do empirical Bayes” but rather had to do with a subtle examination of this interaction. And, in the late 1980s and early 1990s, there wasn’t so much interest in this in the field of statistics. On one side, the anti-Bayesians were still riding high in their rejection of all things prior, even in some quarters a rejection of probability modeling itself. On the other side, a growing number of Bayesians—inspired by applied successes in fields as diverse as psychometrics, pharmacology, and political science—were content to just fit models and not worry about their statistical properties.

Similarly with empirical Bayes, a term which in the hands of Efron and Morris represented a careful, even precarious, theoretical structure intended to capture classical statistical criteria in a setting where the classical ideas did not quite apply, a setting that mixed estimation and prediction—but which had devolved to typically just be shorthand for “Bayesian inference, plugging in point estimates for the hyperparameters.” In an era where the purveyors of classical theory didn’t care to wrestle with the complexities of empirical Bayes, and where Bayesians had built the modeling and technical infrastructure needed to fit full Bayesian inference, hyperpriors and all, there was not much of a market for Carl’s hybrid ideas.

This is why I say that, at the time Carl Morris came to Harvard, his work was honored and recognized as pathbreaking, but his actual research agenda was outside the mainstream.

As noted above, though, I think things have changed. The first clue—although it was not at all clear to me at the time—was Trevor Hastie and Rob Tibshirani’s lasso regression, which was developed in the early 1990s and which has of course become increasingly popular in statistics, machine learning, and all sorts of applications. Lasso is important to me partly as the place where Bayesian ideas of shrinkage or partial polling entered what might be called the Stanford school of statistics. But for the present discussion what is most relevant is the centrality of the functional form. The point of lasso is not just partial pooling, it’s partial pooling with an exponential prior. As I said, I did not notice the connection with Carl’s work and other Stein-inspired work back when lasso was introduced—at that time, much was made of the shrinkage of certain coefficients all the way to zero, which indeed is important (especially in practical problems with large numbers of predictors), but my point here is that the ideas of the late 1950s and early 1960s again become relevant. It’s not enough just to say you’re partial pooling—it matters _how_ this is being done.

In recent years there’s been a flood of research on prior distributions for hierarchical models, for example the work by Nick Polson and others on the horseshoe distribution, and the issues raised by Carl in his classic work are all returning. I can illustrate with a story from my own work. A few years ago some colleagues and I published a paper on penalized marginal maximum likelihood estimation for hierarchical models using, for the group-level variance, a gamma prior with shape parameter 2, which has the pleasant feature of keeping the point estimate off of zero while allowing it to be arbitrarily close to zero if demanded by the data (a pair of properties that is not satisfied by the uniform, lognormal, or inverse-gamma distributions, all of which had been proposed as classes of priors for this model). I was (and am) proud of this result, and I linked it to the increasingly popular idea of weakly informative priors. After talking with Carl, I learned that these ideas were not new to me, indeed these were closely related to the questions that Carl has been wrestling with for decades in his research, as they relate both to the technical issue of the combination of prior and data distributions, and the larger concerns about default Bayesian (or Bayesian-like) inferences.

In short: in the late 1980s, it was enough to be Bayesian. Or, perhaps I should say, Bayesian data analysis was in its artisanal period, and we tended to be blissfully ignorant about the dependence of our inferences on subtleties of the functional forms of our models. Or, to put a more positive spin on things: when our inferences didn’t make sense, we changed our models, hence the methods we used (in concert with the prior information implicitly encoded in that innocent-sounding phrase, “make sense”) had better statistical properties than one would think based on theoretical analysis alone. Real-world inferences can be superefficient, as Xiao-Li Meng might say, because they make use of tacit knowledge.

In recent years, however, Bayesian methods (or, more generally, regularization, thus including lasso and other methods that are only partly in the Bayesian fold) have become routine, to the extent that we need to think of them as defaults, which means we need to be concerned about . . . their frequency properties. Hence the re-emergence of truly empirical Bayesian ideas such as weakly informative priors, and the re-emergence of research on the systematic properties of inferences based on different classes of priors or regularization. Again, this all represents a big step beyond the traditional classification of distributions: in the robust or empirical Bayesian perspective, the relevant properties of a prior distribution depend crucially on the data model to which it is linked.

So, over 25 years after taking Carl’s class, I’m continuing to see the centrality of his work to modern statistics: ideas from the early 1960s that were in many ways ahead of their time.

Let me conclude with the observation that Carl seemed to us to be a “man out of time” on the personal level as well. In 1989 he seemed ageless to us both physically and in his personal qualities, and indeed I still view him that way. When he came to Harvard he was not young (I suppose he was about the same age as I am now!) but he had, as the saying goes, the enthusiasm of youth, which indeed continues to stay with him. At the same time, he has always been even-tempered, and I expect that, in his youth, people remarked upon his maturity. It has been nearly fifty years since Carl completed his education, and his ideas remain fresh, and I continue to enjoy his warmth, humor, and insights.

What’s the most important thing in statistics that’s not in the textbooks?

VennDiagram

As I wrote a couple years ago:

Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model variation, but it’s not necessary. Nor is it necessary to have “true” randomness (of the dice-throwing or urn-sampling variety) in order to have a useful probability model.

For my money, the #1 neglected topic in statistics is measurement.

In most statistics texts that I’ve seen, there’s a lot on data analysis and some stuff on data collection—sampling, random assignment, and so forth—but nothing at all on measurement. Nothing on reliability and validity but, even more than that, nothing on the concept of measurement, the idea of considering the connection between the data you gather and the underlying object of your study.

It’s funny: the data model (the “likelihood”) is central to much of the theory and practice of statistics, but the steps that are required to make this work—the steps of measurement and assessment of measurements—are hidden.

When it comes to the question of how to take a sample or how to randomize, or the issues that arise (nonresponse, spillovers, selection, etc.) that interfere with the model, statistics textbooks take the practical issues seriously—even an intro statistics book will discuss topics such as blinding in experiments and self-selection in surveys. But when it comes to measurement, there’s silence, just an implicit assumption that the measurement is what it is, that it’s valid and that it’s as reliable as it needs to be.

Bad things happen when we don’t think seriously about measurement

And then what happens? Bad, bad things.

In education—even statistics education—we don’t go to the trouble of accurately measuring what students learn. Why? Part of it is surely that measurement takes effort, and we have other demands on our time. But it’s more than that. I think a large part is that we don’t carefully think about evaluation as a measurement issue and we’re not clear on what we want students to learn and how we can measure this. Sure, we have vague ideas, but nothing precise. In other aspects of statistics we aim for precision, but when it comes to measurement, we turn off our statistics brain. And I think this is happening, in part, because the topic of measurement is tucked away in an obscure corner of statistics and is then forgotten.

And in research too, we see big problems. Consider all those “power = .06″ experiments, these “Psychological Science”-style papers we’ve been talking so much about in recent years. A common thread in these studies is sloppy, noisy, biased measurement. Just a lack of seriousness about measurement and, in particular, a resistance to the sort of within-subject designs which much more directly measure the within-person variation that is often of interest in such studies.

Measurement, measurement, measurement. It’s central to statistics. It’s central to how we learn about the world.

Eccentric mathematician

I just read this charming article by Lee Wilkinson’s brother on a mathematician named Yitang Zhang. Zhang recently gained some fame after recently proving a difficult theorem, and he seems to be a quite unusual, but likable, guy.

What I liked about Wilkinson’s article is how it captured Zhang’s eccentricities with affection but without condescension. Zhang is not like the rest of us, but from reading the article, I get the sense of him as an individual, not defined by his mathematical abilities.

At one level, sure, duh: each of us is an individual. I’m an unusual person myself so maybe it’s a bit rich for me to put the “eccentric” label on some mathematician I’ve never met.

But I think there’s more to it than that. For one thing, I think the usual way to frame an article about someone like this is to present him as a one-of-a-kind genius, to share stories about how brilliant he is. Here, though, you get the idea that Zhang is a top mathematician but not that he has some otherworldly brilliance. Similarly, he solved a really tough problem but we don’t have to hear all about how he’s the greatest of all time. Rather, I get the idea from Wilkinson that Zhang’s life is worth living even if he hadn’t done this great work. Of course, without that, the idea for the article never would’ve come up in the first place, but still.

Here’s a paragraph. I don’t know if it conveys the feeling I’m trying to share but here goes:

Zhang met his wife, to whom he has been married for twelve years, at a Chinese restaurant on Long Island, where she was a waitress. Her name is Yaling, but she calls herself Helen. A friend who knew them both took Zhang to the restaurant and pointed her out. “He asked, ‘What do you think of this girl?'” Zhang said. Meanwhile, she was considering him. To court her, Zhang went to New York every weekend for several months. The following summer, she came to New Hampshire. She didn’t like the winters, though, and moved to California, where she works at a beauty salon. She and Zhang have a house in San Jose, and he spends school vacations there.

So gentle, both on the part of Zhang and of Wilkinson. New Yorker, E. B. White-style, and I mean that in a good way here. It could’ve come straight out of Charlotte’s Web. And it’s such a relief to read after all the Erdos-Feynman-style hype, not to mention all the latest crap about tech zillionaires. I just wish I could’ve met Stanislaw Ulam.

On deck this week

Mon: Eccentric mathematician

Tues: What’s the most important thing in statistics that’s not in the textbooks?

Wed: Carl Morris: Man Out of Time [reflections on empirical Bayes]

Thurs: “The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong.”

Fri: Good, mediocre, and bad p-values

Sat: Which of these classes should he take?

Sun: Forget about pdf: this looks much better, it makes all my own papers look like kids’ crayon drawings by comparison.

This year’s Atlantic Causal Inference Conference: 20-21 May

Dylan Small writes:

The conference will take place May 20-21 (with a short course on May 19th) and the web site for the conference is here. The deadline for submitting a poster title for the poster session is this Friday. Junior researchers (graduate students, postdoctoral fellows, and assistant professors) whose poster demonstrates exceptional research will also be considered for the Thomas R. Ten Have Award, which recognizes “exceptionally creative or skillful research on causal inference.” The two award winners will be invited to speak at the 2016 Atlantic Causal Inference Conference.

We held the first conference in this series ten years ago at Columbia, and I’m glad to see it’s still doing well.

Statistical analysis on a dataset that consists of a population

This is an oldie but a goodie.

Donna Towns writes:

I am wondering if you could help me solve an ongoing debate?

My colleagues and I are discussing (disagreeing) on the ability of a researcher to analyze information on a population. My colleagues are sure that a researcher is unable to perform statistical analysis on a dataset that consists of a population, whereas I believe that statistical analysis is appropriate if you are testing future outcomes. For example, a group of inmates in a detention centre receive a new program. As it would contravene ethics, all offenders receive the program. Therefore, a researcher would need to compare a group of inmates prior to the introduction of the program. Assuming, or after confirm that these two populations are similar, are we able to apply statistical analysis to compare the outcomes of these to populations (such as time to return to detention)? If so, what would be the methodologies used? Do you happen to know of any articles that discuss this issue?

I replied with a link to this post from 2009, which concludes:

If you set up a model including a probability distribution for these unobserved outcomes, standard errors will emerge.