Skip to content

One-tailed or two-tailed?

two-tailed

Someone writes:

Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)?

I know you will probably answer: Forget the t-test; you should use Bayesian methods instead.

But what is the standard frequentist answer to this question?

My reply:

The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice:

http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

http://www.stat.columbia.edu/~gelman/research/published/pvalues3.pdf

If you get to the point of asking, just do it. But some difficulties do arise . . .

Nelson Villoria writes:

I find the multilevel approach very useful for a problem I am dealing with, and I was wondering whether you could point me to some references about poolability tests for multilevel models. I am working with time series of cross sectional data and I want to test whether the data supports cross sectional and/or time pooling. In a standard panel data setting I do this with Chow tests and/or CUSUM. Are these ideas directly transferable to the multilevel setting?

My reply: I think you should do partial pooling. Once the question arises, just do it. Other models are just special cases. I don’t see the need for any test.

That said, if you do a group-level model, you need to consider including group-level averages of individual predictors (see here). And if the number of groups is small, there can be real gains from using an informative prior distribution on the hierarchical variance parameters. This is something that Jennifer and I do not discuss in our book, unfortunately.

Looking for Bayesian expertise in India, for the purpose of analysis of sarcoma trials

Prakash Nayak writes:

I work as a musculoskeletal oncologist (surgeon) in Mumbai, India and am keen on sarcoma research.

Sarcomas are rare disorders, and conventional frequentist analysis falls short of providing meaningful results for clinical application.

I am thus keen on applying Bayesian analysis to a lot of trials performed with small numbers in this field.

I need advise from you for a good starting point for someone uninitiated in Bayesian analysis. What to read, what courses to take and is there a way I could collaborate with any local/international statisticians dealing with these methods.

I have attached a recent publication [Optimal timing of pulmonary metastasectomy – is a delayed operation beneficial or counterproductive?, by M. Kruger, J. D. Schmitto, B. Wiegmannn, T. K. Rajab, and A. Haverich] which is one amongst others I understand would benefit from some Bayesian analyses.

I have no idea who in India works in this area so I’m just putting this one out there in the hope that someone will be able to make the connection.

When you believe in things that you don’t understand

Stevie+Wonder+-+The+Woman+In+Red+-+LP+RECORD-523839

This would make Karl Popper cry. And, at the very end:

The present results indicate that under certain, theoretically predictable circumstances, female ovulation—long assumed to be hidden—is in fact associated with a distinct, objectively observable behavioral display.

This statement is correct—if you interpret the word “predictable” to mean “predictable after looking at your data.”

P.S. I’d like to say that April 15 is a good day for this posting because your tax dollars went toward supporting this research. But actually it was supported by the Social Sciences Research Council of Canada, and I assume they do their taxes on their own schedule.

P.P.S. In preemptive response to people who think I’m being mean by picking on these researchers, let me just say: Nobody forced them to publish these articles. If you put your ideas out there, you have to be ready for criticism.

Transitioning to Stan

Kevin Cartier writes:

I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect).

My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort of point estimate). At that point, Stan is a winner compared to programming one’s own Monte Carlo algorithm.

We (the Stan team) should really prepare a document with a bunch of examples where Stan is a win, in one way or another. But of course preparing such a document takes work, which we’d rather spend on improving Stan (or on blogging…)

On deck this week

Mon: Transitioning to Stan

Tues: When you believe in things that you don’t understand

Wed: Looking for Bayesian expertise in India, for the purpose of analysis of sarcoma trials

Thurs: If you get to the point of asking, just do it. But some difficulties do arise . . .

Fri: One-tailed or two-tailed?

Sat: Index or indicator variables

Sun: Fooled by randomness

“If you are primarily motivated to make money, you . . . certainly don’t want to let people know how confused you are by something, or how shallow your knowledge is in certain areas. You want to project an image of mastery and omniscience.”

A reader writes in:

This op-ed made me think of one your recent posts. Money quote:

If you are primarily motivated to make money, you just need to get as much information as you need to do your job. You don’t have time for deep dives into abstract matters. You certainly don’t want to let people know how confused you are by something, or how shallow your knowledge is in certain areas. You want to project an image of mastery and omniscience.

Continue reading ‘“If you are primarily motivated to make money, you . . . certainly don’t want to let people know how confused you are by something, or how shallow your knowledge is in certain areas. You want to project an image of mastery and omniscience.”’ »

“Schools of statistical thoughts are sometimes jokingly likened to religions. This analogy is not perfect—unlike religions, statistical methods have no supernatural content and make essentially no demands on our personal lives. Looking at the comparison from the other direction, it is possible to be agnostic, atheistic, or simply live one’s life without religion, but it is not really possible to do statistics without some philosophy.”

This bit is perhaps worth saying again, especially given the occasional trolling on the internet by people who disparage their ideological opponents by calling them “religious” . . . So here it is:

Sometimes the choice of statistical philosophy is decided by convention or convenience. . . . In many settings, however, we have freedom in deciding how to attack a problem statistically. How then do we decide how to proceed?

Schools of statistical thoughts are sometimes jokingly likened to religions. This analogy is not perfect—unlike religions, statistical methods have no supernatural content and make essentially no demands on our personal lives. Looking at the comparison from the other direction, it is possible to be agnostic, atheistic, or simply live one’s life without religion, but it is not really possible to do statistics without some philosophy. Even if you take a Tukeyesque stance and admit only data and data manipulations without reference to probability models, you still need some criteria to evaluate the methods that you choose.

One way in which schools of statistics are like religions is in how we end up affiliating with them. Based on informal observation, I would say that statis- ticians typically absorb the ambient philosophy of the institution where they are trained—or else, more rarely, they rebel against their training or pick up a philosophy later in their career or from some other source such as a persuasive book. Similarly, people in modern societies are free to choose their religious affiliation but it typically is the same as the religion of parents and extended family. Philosophy, like religion but not (in general) ethnicity, is something we are free to choose on our own, even if we do not usually take the opportunity to take that choice. Rather, it is common to exercise our free will in this setting by forming our own personal accommodation with the religion or philosophy bequeathed to us by our background.

For example, I affiliated as a Bayesian after studying with Don Rubin and, over the decades, have evolved my own philosophy using his as a starting point. I did not go completely willingly into the Bayesian fold—the first statistics course I took (before I came to Harvard) had a classical perspective, and in the first course I took with Don, I continued to try to frame all the inferential problems into a Neyman-Pearson framework. But it didn’t take me or my fellow students long to slip into comfortable conformity. . . .

Beliefs and affiliations are interesting and worth studying, going beyond simple analogies to religion.

P.S. See here for some similar thoughts from a few years ago. The key point is that a belief is not (necessarily) the same thing as a religion, and I don’t think it’s helpful for people to use “religion” as a generalized insult that is applied to beliefs that they disagree with.

“More research from the lunatic fringe”

A linguist send me an email with the above title and a link to a paper, “The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets,” by M. Keith Chen, which begins:

Languages differ widely in the ways they encode time. I test the hypothesis that languages that grammatically associate the future and the present, foster future-oriented behavior. This prediction arises naturally when well-documented e§ects of language structure are merged with models of intertemporal choice. Empirically, I find that speakers of such languages: save more, retire with more wealth, smoke less, practice safer sex, and are less obese. This holds both across countries and within countries when comparing demographically similar native households. The evidence does not support the most obvious forms of common causation. I discuss implications for theories of intertemporal choice.

I ran this by another linguist who confirmed the “lunatic fringe” comment and pointed me to this post from Mark Liberman and this followup from Keith Chen. My friend also wrote:

I think it’d be well-nigh impossible to separate the effect of speaking West Greenlandic from living in West Greenland, or more reasonably, speaking Finnish from living in Finland. Who else speaks Finnish (maybe some Swedes?)

My reply:

B-b-but . . . the paper is scheduled to appear in the American Economic Review! Short of Science, Nature, and Psychological Science, that’s probably the most competitive and prestigious journal in the universe.

More seriously, this is an interesting case because I have no intuition about the substance of the matter (unlike various examples in psychology and political science). The theoretical microeconomic model in the paper seems ridiculous to me, that’s for sure, but I have no good way to think about the cross-country comparisons, one way or another.

Small multiples of lineplots > maps (ok, not always, but yes in this case)

Kaiser Fung shares this graph from Ritchie King:

6a00d8341e992c53ef01a73d6fafa0970d-500wi

Kaiser writes:

What they did right:

- Did not put the data on a map
- Ordered the countries by the most recent data point rather than alphabetically
- Scale labels are found only on outer edge of the chart area, rather than one set per panel
- Only used three labels for the 11 years on the plot
- Did not overdo the vertical scale either

The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout.

I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done well.

Kaiser continues:

One way to appreciate the greatness of the chart is to look at alternatives.

Here, the Economist tries the lazy approach of using a map: (link)

Economist_alcohol

For one thing, they have to give up the time dimension.

A variation is a cartogram in which the physical size and shape of countries are mapped to the underlying data. Here’s one on Worldmapper (link):

Worldmapper_cartogram_alcohol

One problem with this transformation is what to do with missing data.

Yup. Also, the big big trouble with the transformed map is that the #1 piece of information it gives you is something we all know already—that China has a lot of people. Sure, if you look carefully you can figure out other things—hey, India has a billion people too but it’s really small on the map, I guess nobody’s drinking much there—but that’s all complicated reasoning involving mental division.

To put it another way, if this distorted map works—and it may well “work,” in the sense of grabbing attention and motivating people to look deeper at these data, which is the #1 goal of an infographic—if it does work, it’s doing so using the Chris Rock effect, in which we enjoy the shock of recognition of a familiar idea presented in an unfamiliar way.

Kaiser continues:

Wikipedia has a better map with variations of one color (link):

Wiki_Alcohol_consumption_per_capita_world_map

I agree that this one is better than the Economist map above. Wikipedia’s uses an equal-area projection (I think) so you don’t get so distracted by massive Greenland, a sensible color scheme with a natural ordering (unlike the Economist’s where it’s obvious that red is highest and pink is next, but then you have to go back to the legend to figure out how the other colors are ordered), also the legend has high numbers on top and low on bottom which again is sensible.

Still and all, the original grid of lines is better for me because (a) it shows the comparisons quantitatively (which in this case makes sense; those differences are huge (actually, so huge that it makes me wonder whether the comparisons are appropriate; is wine drinking in Portugal so much different than downing shots of soju in Korea?)) and, (b) it shows the time trends (most notably, the declines in Russia and Brazil, the increase from a low baseline in India, and Korea’s steady #1 position).

The click-through solution

Let me conclude, as always in this sort of discussion, that displaying patterns in the data is not the only reason for a graph. Another reason is to grab attention. If an unusually-colored map catches people’s eyes, maybe that’s the best way to go. My ideal solution would be click-through: the Economist (or wherever) has the colorful map with instructions to click to see the informative grid of line plots, then you can click again and get a spreadsheet with all the numbers.