Skip to content

Friedrich Nietzsche (4) vs. John Updike; Austen advances

I chose yesterday‘s winner based on the oddest comment we’ve received so far in the competition, from AC:

I’d love to see what Jane Austen (Austen’s early Regency dress style: thought of late Regency dresses (, which were basically the exact opposite sensibility. It’s an astonishingly quick reversal, from narrow and prim to a sort of walking wedding cake in twenty years. I imagine she’d have had some interesting thoughts, but she died too early to see it.

You go girl.

As for today, I’m surprised the man from Shillington has survived this far—his opponents were Buddha and Bertrand Russell, neither of whom is a tomato can. He got by on his good turns of phrase. Meanwhile, the angry German philosopher is coming up on the outside. If we could get them both to speak, we could have a spirited debate about God. Again, though, only one can advance.

P.S. As always, here’s the background, and here are the rules.

Regression: What’s it all about? [Bayesian and otherwise]

Regression: What’s it all about?

Regression plays three different roles in applied statistics:

1. A specification of the conditional expectation of y given x;

2. A generative model of the world;

3. A method for adjusting data to generalize from sample to population, or to perform causal inferences.

We could also include prediction, but I prefer to see that as a statistical operation that is implied for all three of the goals above: conditional prediction as a generalization of conditional expectation, prediction as the application of a linear model to new cases, and prediction for unobserved cases in the population or for unobserved potential outcomes in a causal inference.

I was thinking about the different faces of regression modeling after being asked to review the new book, Bayesian and Frequentist Regression Methods, by Jon Wakefield, a statistician who is known for his work on Bayesian modeling in pharmacology, genetics, and public health. . . .

Here is Wakefield’s summary of Bayesian and frequentist regression:

For small samples, the Bayesian approach with thoughtfully well-specified priors is often the only way to go because of the difficulty in obtaining well-calibrated frequentist intervals. . . . For medium to large samples, unless there is strong prior information that one wishes to incorporate, a robust frequentist approach . . . is very appealing since consistency is guaranteed under relatively mild conditions. For highly complex models . . . a Bayesian approach is often the most convenient way to formulate the model . . .

All this is reasonable, and I appreciate Wakefield’s effort to delineate the scenarios where different approaches are particularly effective. Ultimately, I think that any statistical problem that can be solved Bayesianly can be solved using a frequentist approach as well (if nothing else, you can just take the Bayesian inference and from it construct an “estimator” whose properties can then be studied and perhaps improved) and, conversely, effective non-Bayesian approaches can be mimicked and sometimes improved by considering them as approximations to posterior inferences. More generally, I think the most important aspect of a statistical method is not what it does with the data but rather what data it uses. That all said, in practice different methods are easier to apply in different problems.

A virtue—and a drawback—of Bayesian inference is that it is all-encompassing. On the plus side, once you have model and data, you can turn the crank, as the saying goes, to get your inference; and, even more importantly, the Bayesian framework allows the inclusion of external information, the “meta-data,” as it were, that come with your official dataset. The difficulty, though, is the requirement of setting up this large model. In addition, along with concerns about model misspecification, I think a vital part of Bayesian data analysis is checking fit to data—a particular concern when setting up complex models—and having systematic ways of improving models to address problems that arise.

I would just like to clarify the first sentence of the quote above, which is expressed in such a dry fashion that I fear it will mislead casual or uninformed readers. When Wakefield speaks of “the difficulty in obtaining well-calibrated frequentist intervals,” this is not just some technical concern, that nominal 95% intervals will only contain the true value 85% of the time, or whatever. The worry is that, when data are weak and there is strong prior information that is not being used, classical methods can give answers that are not just wrong—that’s no dealbreaker, it’s accepted in statistics that any method will occasionally give wrong answers—but clearly wrong, obviously wrong. Wrong not just conditional on the unknown parameter, but conditional on the data. Scientifically inappropriate conclusions. That’s the meaning of “poor calibration.” Even this, in some sense, should not be a problem—after all, if a method gives you a conclusion that you know is wrong, you can just set it aside, right?—but, unfortunately, many users of statistics consider to take p < 0.05 or p < 0.01 comparisons as “statistically significant” and to use these as motivation to accept their favored alternative hypotheses. This has led to such farces as recent claims in leading psychology journals that various small experiments have demonstrated the existence of extra-sensory perception, or huge correlations between menstrual cycle and voting, and so on.

In delivering this brief rant, I am not trying to say that classical statistical methods should be abandoned or that Bayesian approaches are always better; I’m just expanding on Wakefield’s statement to make it clear that this problem of “calibration” is not merely a technical issue; it’s a real-life concern about the widespread exaggeration of the strength of evidence from small noisy datasets supporting scientifically implausible claims based on statistical significance.

Frequentist inference has the virtue and drawback of being multi-focal, of having no single overarching principle of inference. From the user’s point of view, having multiple principles (unbiasedness, asymptotic efficiency, coverage, etc.) gives more flexibility and, in some settings, more robustness, with the downside being that application of the frequentist approach requires the user to choose a method as well as a model. As with Bayesian methods, this flexibility puts some burden on the user to check model fit to decide where to go when building a regression.

Regression is important enough that it deserves a side-by-side treatment of Bayesian and frequentist approaches. The next step to take the level of care and precision that is taken when considering inference and computation given the model, and apply this same degree of effort to the topics of building, checking, and understanding regressions. There are a number of books on applied regression, but connecting the applied principles to theory is a challenge. A related challenge in exposition is to unify the three goals noted at the beginning of this review. Wakefield’s book is an excellent start.

Stewart Lee vs. Jane Austen; Dick advances

Yesterday‘s deciding arguments came from Horselover himself.

As quoted by Dalton:

Any given man sees only a tiny portion of the total truth, and very often, in fact almost . . . perpetually, he deliberately deceives himself about that precious little fragment as well.


We ourselves are information-rich; information enters us, is processed and is then projected outwards once more, now in an altered form. We are not aware that we are doing this, that in fact this is all we are doing.

Wow—Turing-esque (but I can’t picture Dick running around the house).

And, as quoted by X:

“But—let me tell you my cat joke. It’s very short and simple. A hostess is giving a dinner party and she’s got a lovely five-pound T-bone steak sitting on the sideboard in the kitchen waiting to be cooked while she chats with the guests in the living room—has a few drinks and whatnot. But then she excuses herself to go into the kitchen to cook the steak—and it’s gone. And there’s the family cat, in the corner, sedately washing it’s face.”

“The cat got the steak,” Barney said.

“Did it? The guests are called in; they argue about it. The steak is gone, all five pounds of it; there sits the cat, looking well-fed and cheerful. “Weigh the cat,” someone says. They’ve had a few drinks; it looks like a good idea. So they go into the bathroom and weigh the cat on the scales. It reads exactly five pounds. They all perceive this reading and a guest says, “okay, that’s it. There’s the steak.” They’re satisfied that they know what happened, now; they’ve got empirical proof. Then a qualm comes to one of them and he says, puzzled, “But where’s the cat?”

Fat wins the thread.

Today’s contest matches up two surprisingly strong unseeded speaker candidates. Jane Austen cuts to the bone, but with discretion; Stewart Lee lets it all hang out. So how do we like our social observations: subtle, or like a refrigerator to the side of the head?

P.S. As always, here’s the background, and here are the rules.

The publication of one of my pet ideas: Simulation-efficient shortest probability intervals

In a paper to appear in Statistics and Computing, Ying Liu, Tian Zheng, and I write:

Bayesian highest posterior density (HPD) intervals can be estimated directly from simulations via empirical shortest intervals. Unfortunately, these can be noisy (that is, have a high Monte Carlo error). We derive an optimal weighting strategy using bootstrap and quadratic programming to obtain a more computation- ally stable HPD, or in general, shortest probability interval (Spin). We prove the consistency of our method. Simulation studies on a range of theoretical and real-data examples, some with symmetric and some with asymmetric posterior densities, show that intervals constructed using Spin have better coverage (relative to the posterior distribution) and lower Monte Carlo error than empirical shortest intervals. We implement the new method in an R package (SPIn) so it can be routinely used in post-processing of Bayesian simulations.

This is one of my pet ideas but it took a long time to get it working. I have to admit I’m still not thrilled with the particular method we’re using—it works well on a whole bunch of examples, but the algorithm itself is a bit clunky. I have a strong intuition that there’s a much cleaner version that does just as well while preserving the basic idea, which is to get a stable estimate of the shortest interval at any given probability level (for example, 0.95) given a bunch of posterior simulations. Once we have this cleaner algorithm, we’ll stick it into Stan, as there are lots of examples (starting with the hierarchical variance parameter in the famous 8-schools model) where a highest probability density interval (or, equivalently, shortest probability interval) makes a lot more sense than the usual central interval.

Mohandas Gandhi (1) vs. Philip K. Dick (2); Hobbes advances

All of yesterday‘s best comments were in favor of the political philosopher. Adam writes:

With Hobbes, the seminar would be “nasty, brutish, and short.” And it would degenerate into a “war of all against all.” In other words, the perfect academic seminar.

And Jonathan writes:

Chris Rock would definitely be more entertaining. But the chance to see a speaker who knew Galileo, basing his scientific worldview on him, and could actually find weak points in the proofs of the best mathematicians of the day (even if he couldn’t do any competent math himself) should not be squandered. . . .

I love Chris Rock, but you can see him on HBO. Let Hobbes have the last word against Wallis.

Also, Hobbes could talk about the implications of bullet control.

And, now, both of today’s contestants have a lot to talk about, and they’re both interested in the real world that underlies what we merely think is real. Gandhi was a vegetarian, but Dick was a cat person, which from my perspective is even better. Which of these two culture heroes is ready for prime time??

P.S. As always, here’s the background, and here are the rules.

Imagining p<.05 triggers increased publication

We’ve all had that experience of going purposefully from one hypothesis to another, only to get there and forget why we made the journey. Four years ago, researcher Daryl Bem and his colleagues stripped this effect down, showing that the simple act of obtaining a statistically significant comparison induces publication in a top journal. Now statisticians at Columbia University, USA, have taken things further, demonstrating that merely imagining a statistically significant p-value is enough to trigger increased publication. . . .

The new study shows that this event division effect can occur in our imagination and doesn’t require literally seeing a pattern that reflects the general population. . . .

OK, I guess at this point you’ll want to see the original, a news article called “Imagining walking through a doorway triggers increased forgetting,” by Christian Jarrett in the British Psychological Society Research Digest:

We’ve all had that experience of going purposefully from one room to another, only to get there and forget why we made the journey. Four years ago, researcher Gabriel Radvansky and his colleagues stripped this effect down, showing that the simple act of passing through a doorway induces forgetting. Now psychologists at Knox College, USA, have taken things further, demonstrating that merely imagining walking through a doorway is enough to trigger increased forgetfulness. . . .

The new study shows that this event division effect can occur in our imagination and doesn’t require literally seeing a doorway and passing through it. . . .

Yes, I do find this funny. But, at the same time, I recognize that these are not easy questions. And, in particular, Jarrett is in a difficult position in that to some extent his job involves the promotion of psychology research, not just the evaluation.

I sometimes have a similar problem when blogging for the Monkey Cage political science blog. A bit of criticism of political science research is OK, but too much and I get pushback.

So, back to the research in question, Lawrence, Z., & Peterson, D. (2014). Mentally walking through doorways causes forgetting: The location updating effect and imagination Memory, 1-9 DOI: 10.1080/09658211.2014.980429.

Based on Jarrett’s description, I see a lot of red flags:

1. Lack of face validity. “Mentally walking through doorways causes forgetting”?? According to Jarrett, “The group who’d imagined passing through a doorway performed worse at the task than the first group who didn’t have to go through a doorway.” This could be true—all things are possible—but it sounds a little weird. And the researchers themselves seem to agree with me on this; see next point.

2. Claims that the effect is both expected and surprising. On one hand, “This effect of an imagined spatial boundary on forgetting is consistent with a related line of research that’s shown forgetting increases after temporal or other boundaries are described in narrative text.” On the other hand, the researchers write, “That walking through a doorway elicits forgetting is surprising because it is such a subtle perceptual feature . . . that simply imagining such a walk yields a similar result is even more surprising . . .”

This is what, following up on some observations from Jeremy Freese, we’ve called the scientific surprise two-step.

3. Lots of different small-n studies but no preregistered replications that I see. Lawrence and Peterson’s finding follows up on a paper by a couple of other researchers, four years ago, which, according to Jarrett, “shows that the simple act of walking through a doorway creates a new memory episode.” The recent paper and that earlier paper have a bunch of studies and comparisons, but it seems like a bit of a ramble (or what Freese calls “Columbian Inquiry”). Each time something interesting shows up, the researchers follow up with a new study that is evaluated in its own way with various idiosyncratic data-analysis choices.

Put it all together, and all I can say is: I’m not convinced. I’m not saying I’m sure these claims are wrong; I just think they’re pretty much at the same status as Nosek, Spies, Motyl’s “50 shades of gray” findings:

Participants from the political left, right and center (N = 1,979) completed a perceptual judgment task in which words were presented in different shades of gray. Participants had to click along a gradient representing grays from near black to near white to select a shade that matched the shade of the word. We calculated accuracy: How close to the actual shade did participants get? The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (p = .01)

That is, before Nosek et al. tried their own preregistered replication:

We could not justify skipping replication on the grounds of feasibility or resource constraints. . . . We conducted a direct replication while we prepared the manuscript. We ran 1,300 participants, giving us .995 power to detect an effect of the original effect size at alpha = .05.

And got this:

The effect vanished (p = .59).

P.S. Just to be clear, I’m not trying to pick on Christian Jarrett. It’s not his job to evaluate the strength of claims that have been published in refereed psychology journals. We all just need to be aware that you can’t believe everything you see in the papers.

Chris Rock (3) vs. Thomas Hobbes; Wood advances

In yesterday‘s contest, there’s no doubt in my mind that Levi-Strauss would give a better and more interesting talk than Wood, whose lecture would presumably feature non-sequiturs, solecisms, continuity violations, and the like.

But the funniest comment was from Jonathan:

Ed Wood on Forecasting:

“We are all interested in the future for that is where you and I are going to spend the rest of our lives.” Plan 9 From Outer Space: The Original Uncensored And Uncut Screenplay

Ed Wood on Bayesian vs. Frequentist:

“One is always considered *mad* if one discovers something that others cannot grasp!” Bride of the monster

These quotes are great! I still don’t see Wood getting into the Final Four, but he earned this one, dammit.

And now we have a struggle of two worthy opponents.

Hobbes got past Larry David in round 1, he destroyed Leo Tolstoy in round 2, and now he’s up against another comedian. Does the Leviathan have it in him to advance to the next round, and, from there, likely to the Final Four? It’s up to you to provide the killer arguments, one way or another.

P.S. As always, here’s the background, and here are the rules.

Another disgraced primatologist . . . this time featuring “sympathetic dentists”


Shravan Vasishth points us to this news item from Luke Harding, “History of modern man unravels as German scholar is exposed as fraud”:

Other details of the professor’s life also appeared to crumble under scrutiny. Before he disappeared from the university’s campus last year, Prof Protsch told his students he had examined Hitler’s and Eva Braun’s bones.

He also boasted of having flats in New York, Florida and California, where, he claimed, he hung out with Arnold Schwarzenegger and Steffi Graf. . . . some of the 12,000 skeletons stored in the department’s “bone cellar” were missing their heads, apparently sold to friends of the professor in the US and sympathetic dentists.

To paraphrase a great scholar:

His resignation is a serious loss for Frankfurt University, and given the nature of the attack on him, for science generally.

I’ve heard he’s going to devote himself to work with at-risk youths.

Claude Levi-Strauss (4) vs. Ed Wood (3); Cervantes wins

For yesterday we have a tough call, having to decide between two much-loved philosophical writers, as Jonathan put it in comments:

Camus on ramdomness; how make a model when there is no signal — only noise.
Cervantes on making the world fit the model through self-delusion.

Two fascinating statistics lectures with the same underlying theme — modelmaking as a chimera: “a horrible or unreal creature of the imagination.”

And, as Zbicyclist writes:

Both are oddly relevant at a time when Ebola threatens and when wind power is making a comeback.

Z almost won it with this comment:

Cervantes would be chivalrous and prompt. Camus would need to take a cigarette break every 5 minutes, that or he’d set off the sprinkler system.

But we’ve already used the cigarette thing, also it’s not so clear that chivalry is a good attribute in a seminar talk.

I’ll go with this quote supplied by Matt:

“The most difficult character in comedy is that of the fool, and he must be no simpleton that plays that part.” -Miguel De Cervantes

And today we have a battle of the dark horses: the versatile sociologist vs. the moviemaker who we laugh at, not with. I don’t see either of them making it past Chris Rock or Thomas Hobbes, but we gotta declare a winner.

P.S. As always, here’s the background, and here are the rules.

Define first, prove later

This post by John Cook features a quote form a book “Calculus on Manifolds,” by Michael Spivak which I think was the textbook for a course I took in college where we learned how to prove Stokes’s theorem, which is something in multivariable calculus involving the divergence and that thing that you get where you turn your hand around and see which way your thumb is pointing, you know, that thing you do to figure out which way the magnetic field goes—the “curl,” maybe??

Here’s the quote from Spivak (as quoted by Cook):

. . . the proof of [Stokes’] theorem is, in the mathematician’s sense, an utter triviality — a straight-forward calculation. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions . . .There are good reasons why the theorems should all be easy and the definitions hard. . . .

Cook places this within a thoughtful discussion of the tradeoff between putting complexity in the definition or in the proof, or, in a computing context, putting complexity in the programming language or in the program itself. To port this to statistics, we might talk about putting complexity in the statistical formalism or in the application. Bayesian statistics, for example, has a complicated formalism but is direct to apply; whereas classical statistical methods are simple—closer to “button-pushing”—but a lot of choice goes into which buttons to push.

Anyway, back to Spivak. I hated the course based on his book. Even though the prof was wonderful—he was my favorite math professor in college, I want up to him after the class was over and asked him to be my advisor—and the textbook itself was super-clear. But the course made me miserable. We started off the semester with a bunch of completely mysterious definitions, continued with weeks and weeks of lemmas that made no sense (even though I could follow each step), and concluded on the last day with the theorem, at which point I’d completely lost the thread.

It was only a bit later, after I happened to come across Proofs and Refutations, Imre Lakatos’s classic reconstruction of an episode in the history of mathematics, when I realized that the professor, and the textbook, did it backwards.

The right way to teach Stokes’s theorem (at least for me) would be to start by proving the theorem—it indeed is straightforward enough that a so-called heuristic proof could be laid out clearly in a single class period—and then step back and ask: what conditions are necessary and sufficient for the theorem to be correct? Or, to put it another way: under what conditions is the theorem false?

Step 1: The proof. (first week of class)
Step 2: The counterexamples. (second week of class)
Step 3: Going backward from there, establishing the conditions for the theorem, that is, the definition, in whatever rigor is required (the remaining 11 weeks of class).

That’s how they should’ve done it.