Skip to content

On deck this week

Mon: Ticket to Baaaath

Tues: Ticket to Baaaaarf

Wed: Thinking of doing a list experiment? Here’s a list of reasons why you should think again

Thurs: An open site for researchers to post and share papers

Fri: Questions about “Too Good to Be True”

Sat: Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu

Sun: White stripes and dead armadillos

Fooled by randomness

From 2006:

Naseem Taleb‘s publisher sent me a copy of “Fooled by randomness: the hidden role of chance in life and the markets” to review. It’s an important topic, and the book is written in a charming style—I’ll try to respond in kind, with some miscellaneous comments.

On the cover of the book is a blurb, “Named by Fortune one of the smartest books of all time.” But Taleb instructs us on page 161-162 to ignore book reviews because of selection bias (the mediocre reviews don’t make it to the book cover).

Books vs. articles

I prefer writing books to writing journal articles because books are written for the reader (and also, in the case of textbooks, for the teacher), whereas articles are written for referees. Taleb definitely seems to be writing to the reader, not the referee. There is risk in book-writing, since in some ways referees are the ideal audience of experts, but I enjoy the freedom in book-writing of being able to say what I really think.

Variation and randomness

Taleb’s general points—about variation, randomness, and selection bias—will be familiar with statisticians and also to readers of social scientists and biologists such as Niall Ferguson, A.J.P. Taylor, Stephen J. Gould, and Bill James who have emphasized the roles of contingency and variation in creating the world we see.

Hyperbole?

On pages xiiv-xlv, Taleb compares the “Utopian Vision, associated with Rousseau, Godwin, Condorcet, Thomas Painen, and conventional normative economists,” to the more realistic “Tragic Vision of humankind that believes in the existence of inherent limitations and flaws in the way we think and act,” associated with Karl Popper, Freidrich Hayek and Milton Friedman, Adam Smith, Herbert Simon, Amos Tversky, and others. He writes, “As an empiricist (actually a skeptical empiricist) I despise the moralizers beyond anything on this planet . . .”

Despise “beyond anything on this planet”?? Isn’t this a bit extreme? What about, for example, hit-and-run drivers? I despise them even more.

Correspondences

On page 39, Taleb quotes the maxim, “What is easy to conceive is clear to express / Words to say it would come effortlessly.” This reminds me of the duality in statistics between computation and model fit: better-fitting models tend to be easier to compute, and computational problems often signal modeling problems. (See here for my paper on this topic.)

Turing Test

On page 72, Taleb writes about the Turing test: “A computer can be said to be intelligent if it can (on aveage) fool a human into mistaking it for another human.” I don’t buy this. At the very least, the computer would have to fool me into thinking it’s another human. I don’t doubt that this can be done (maybe another 5-20 years, I dunno). But I wouldn’t use the “average person” as a judge. Average people can be fooled all the time. If you think I can be fooled easily, don’t use me as a judge, either. Use some experts.

Evaluations based on luck

I’m looking at my notes. Something in Taleb’s book, but I ‘m not sure what, reminded me of a pitfall in the analysis of algorithms that forecast elections. People have written books about this, “The Keys to the White House,” etc. Anyway, the past 50 years have seen four Presidential elections that have been, essentially (from any forecasting standpoint), ties: 1960, 1968, 1976, 2000. Any forecasting method should get no credit for forecasting the winner in any of these elections, and no blame for getting it wrong. Also in the past 50 years, there have been four Presidential elections that were landslides: 1956, 1964, 1972, 1984. (Perhaps you could also throw 1996 in there; obviously the distinction is not precise.) Any forecasting method better get these right, otherwise it’s not to be taken seriously at all. What is left are 1980, 1988, 1992, 1996, 2004: only 5 actual test cases in 50 years! You have a 1/32 chance of getting them all right by chance. This is not to say that forecasts are meaningless, just that a simple #correct is too crude a summary to be useful.

Lotteries

I once talked with someone who wanted to write a book called Winners, interviewing a bunch of lottery winners. Actually Bruce Sacerdote and others have done statistical studies of lottery winners, using the lottery win as a randomly assigned treatment. But my response was to write a book called Losers, interviewing a bunch of randomly-selected lottery players, almost all of which, of course, would be net losers.

Finance and hedging

When I was in college I interviewed for a summer job for an insurance company. The interviewer told me that his boss “basically invented hedging.” He also was getting really excited about a scheme for moving profits around between different companies so that none of the money got taxed. It gave me a sour feeling, but in retrospect maybe he was just testing me out to see what my reaction would be.

Forecasts, uncertainty, and motivations

Taleb describes the overconfidence of many “experts.” Some people have a motivation to display certainty. For example, auto mechanics always seemed to me to be 100% sure of their diagnosis (“It’s the electrical system”), then when they were wrong, it never would bother them a bit. Setting aside possible fradulence, I think they have a motivation to be certain, because we’re unlikely to follow their advice if they qualify it. In the other direction, academics like me perhaps have a motivation to overstate uncertainty, to avoid the potential loss in reputation from saying something stupid. But in practice, people seem to understate our uncertainty most of the time.

Some experts aren’t experts at all. I was once called by a TV network (one of the benefits of living in New York?) to be interviewed about the lottery. I’m no expert—I referred them to Clotfelter and Cook. Other times, I’ve seen statisticians quoted in the paper on subjects they know nothing about. Once, several years ago, a colleague came into my office and asked me what “sampling probability proportional to size” was. It turned out he was doing some consulting for the U.S. government. I was teaching a sampling class at the time, so i could help him out. But it was a little scary that he had been hired as a sampling expert. (And, yes, I’ve seen horrible statistical consulting in the private sector as well.)

Summary

A thought-provoking and also fun book. The statistics of low-probability events has long interested me, and the stuff about the financial world was all new to me. The related work of Mandelbrot discusses some of these ideas from a more technical perspective. (I became aware of Mandelbrot’s work on finance through this review by Donald MacKenzie.)

P.S.

Taleb is speaking this Friday at the Collective Dynamics Seminar.

Update (2014):

I thought Fooled by Randomness made Taleb into a big star, but then his followup effort, The Black Swan, really hit the big time. I reviewed The Black Swan here.

The Collective Dynamics Seminar unfortunately is no more; several years ago, Duncan Watts left Columbia to join Yahoo research (or, as I think he was contractually required to write, Yahoo! research). Now he and his colleagues (who are my collaborators too) work at Microsoft research, still in NYC.

Index or indicator variables

Someone who doesn’t want his name shared (for the perhaps reasonable reason that he’ll “one day not be confused, and would rather my confusion not live on online forever”) writes:

I’m exploring HLMs and stan, using your book with Jennifer Hill as my field guide to this new territory. I think I have a generally clear grasp on the material, but wanted to be sure I haven’t gone astray.

The problem in working on involves a multi-nation survey of students, and I’m especially interested in understanding the effects of country, religion, and sex, and the interactions among those factors (using IRT to estimate individual-level ability, then estimating individual, school, and country effects).

Following the basic approach laid out in chapter 13 for such interactions between levels, I think I need to create a matrix of indicator variables for religion and sex. Elsewhere in the book, you recommend against indicator variables in favor of a single index variable.

Am I right in thinking that this is purely a matter of convenience, and that the matrix formulation of chapter 13 requires indicator variables, but that the matrix of indicators or the vector of indices yield otherwise identical results? I can’t see why they shouldn’t be the same, but my intuition is still developing around multi-level models.

I replied:

Yes, models can be formulated equivalently in terms of index or indicator variables. If a discrete variable can take on a bunch of different possible values (for example, 50 states), it makes sense to use a multilevel model rather than to include indicators as predictors with unmodeled coefficients. If the variable takes on only two or three values, you can still do a multilevel model but really it would be better at that point to use informative priors for any variance parameters. That’s a tactic we do not discuss in our book but which is easy to implement in Stan, and I’m hoping to do more of it in the future.

To which my correspondent wrote:

The main difference that occurs to me as I work through implementing this is that the matrix of indicator variables loses information about what the underlying variable was. So, for instance, if the matrix mixes an indicator for sex and n indicators for religion and m indicators for schools, we’d have Sigma_beta be an m+n+1 x m+n+1 matrix, when we really want a 3×3 matrix.

I could set up the basic structure of Sigma_beta, separately estimate the diagonal elements with a series of multilevel loops by sex, religion, and school, and eschew the matrix formulation in the individual model. So instead of y~N(X_iB_j[i],sigma^2_y) it would be (roughly, I’m doing this on my phone):

y_i~N(beta_sex[i]+beta_sex_country[country[i]]+beta_religion[i]+beta_religion_country[i,country[i]]+beta_school[i]+beta_school_country[i,country[i]],sigma^2_y)

And the group-level formulation unchanged. Sigma_beta becomes a 3×3 matrix rather than an m+n+1 matrix, which seems both more reasonable and more computationally tractable.

My reply:

Now I’m getting tangled in your notation. I’m not sure what Sigma_beta is.

One-tailed or two-tailed?

two-tailed

Someone writes:

Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)?

I know you will probably answer: Forget the t-test; you should use Bayesian methods instead.

But what is the standard frequentist answer to this question?

My reply:

The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice:

http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

http://www.stat.columbia.edu/~gelman/research/published/pvalues3.pdf

P.S. In the comments, Sameer Gauria summarizes a key point:

It’s inappropriate to view a low P value (indicating a misfit of the null hypothesis to data) as strong evidence in favor of a specific alternative hypothesis, rather than other, perhaps more scientifically plausible, alternatives.

This is so important. You can take lots and lots of examples (most notably, all those Psychological Science-type papers) with statistically significant p-values, and just say: Sure, the p-value is 0.03 or whatever. I agree that this is evidence against the null hypothesis, which in these settings typically has the following five aspects:
1. The relevant comparison or difference or effect in the population is exactly zero.
2. The sample is representative of the population.
3. The measurement in the data corresponds to the quantities of interest in the population.
4. The researchers looked at exactly one comparison.
5. The data coding and analysis would have been the same had the data been different.
But, as noted above, evidence against the null hypothesis is not, in general, strong evidence in favor of a specific alternative hypothesis, rather than other, perhaps more scientifically plausible, alternatives.

If you get to the point of asking, just do it. But some difficulties do arise . . .

Nelson Villoria writes:

I find the multilevel approach very useful for a problem I am dealing with, and I was wondering whether you could point me to some references about poolability tests for multilevel models. I am working with time series of cross sectional data and I want to test whether the data supports cross sectional and/or time pooling. In a standard panel data setting I do this with Chow tests and/or CUSUM. Are these ideas directly transferable to the multilevel setting?

My reply: I think you should do partial pooling. Once the question arises, just do it. Other models are just special cases. I don’t see the need for any test.

That said, if you do a group-level model, you need to consider including group-level averages of individual predictors (see here). And if the number of groups is small, there can be real gains from using an informative prior distribution on the hierarchical variance parameters. This is something that Jennifer and I do not discuss in our book, unfortunately.

Looking for Bayesian expertise in India, for the purpose of analysis of sarcoma trials

Prakash Nayak writes:

I work as a musculoskeletal oncologist (surgeon) in Mumbai, India and am keen on sarcoma research.

Sarcomas are rare disorders, and conventional frequentist analysis falls short of providing meaningful results for clinical application.

I am thus keen on applying Bayesian analysis to a lot of trials performed with small numbers in this field.

I need advise from you for a good starting point for someone uninitiated in Bayesian analysis. What to read, what courses to take and is there a way I could collaborate with any local/international statisticians dealing with these methods.

I have attached a recent publication [Optimal timing of pulmonary metastasectomy – is a delayed operation beneficial or counterproductive?, by M. Kruger, J. D. Schmitto, B. Wiegmannn, T. K. Rajab, and A. Haverich] which is one amongst others I understand would benefit from some Bayesian analyses.

I have no idea who in India works in this area so I’m just putting this one out there in the hope that someone will be able to make the connection.

When you believe in things that you don’t understand

Stevie+Wonder+-+The+Woman+In+Red+-+LP+RECORD-523839

This would make Karl Popper cry. And, at the very end:

The present results indicate that under certain, theoretically predictable circumstances, female ovulation—long assumed to be hidden—is in fact associated with a distinct, objectively observable behavioral display.

This statement is correct—if you interpret the word “predictable” to mean “predictable after looking at your data.”

P.S. I’d like to say that April 15 is a good day for this posting because your tax dollars went toward supporting this research. But actually it was supported by the Social Sciences Research Council of Canada, and I assume they do their taxes on their own schedule.

P.P.S. In preemptive response to people who think I’m being mean by picking on these researchers, let me just say: Nobody forced them to publish these articles. If you put your ideas out there, you have to be ready for criticism.

Transitioning to Stan

Kevin Cartier writes:

I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect).

My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort of point estimate). At that point, Stan is a winner compared to programming one’s own Monte Carlo algorithm.

We (the Stan team) should really prepare a document with a bunch of examples where Stan is a win, in one way or another. But of course preparing such a document takes work, which we’d rather spend on improving Stan (or on blogging…)

On deck this week

Mon: Transitioning to Stan

Tues: When you believe in things that you don’t understand

Wed: Looking for Bayesian expertise in India, for the purpose of analysis of sarcoma trials

Thurs: If you get to the point of asking, just do it. But some difficulties do arise . . .

Fri: One-tailed or two-tailed?

Sat: Index or indicator variables

Sun: Fooled by randomness

“If you are primarily motivated to make money, you . . . certainly don’t want to let people know how confused you are by something, or how shallow your knowledge is in certain areas. You want to project an image of mastery and omniscience.”

A reader writes in:

This op-ed made me think of one your recent posts. Money quote:

If you are primarily motivated to make money, you just need to get as much information as you need to do your job. You don’t have time for deep dives into abstract matters. You certainly don’t want to let people know how confused you are by something, or how shallow your knowledge is in certain areas. You want to project an image of mastery and omniscience.

Continue reading ‘“If you are primarily motivated to make money, you . . . certainly don’t want to let people know how confused you are by something, or how shallow your knowledge is in certain areas. You want to project an image of mastery and omniscience.”’ »