Skip to content

Of statistics class and judo class: Beyond the paradigm of sequential education

In judo class they kinda do the same thing every time: you warm up and then work on different moves. Different moves in different classes, and there are different levels, but within any level the classes don’t really have a sequence. You just start where you start, practice over and over, and gradually improve. Different students in the class are at different levels, both when it comes to specific judo expertise and also general strength, endurance, and flexibility, so it wouldn’t make sense to set up the class sequentially. Even during the semester, some people show up at the dojo once a week, others twice or three times a week.

Now compare that with how we organize our statistics classes. Each course has a sequence, everyone starts on week 1 and ends on week 15 (or less for a short course), and the assumption is that everyone starts at the same level, has the same abilities, will put in the same level of effort, and can learn at the same pace. We know this isn’t so, and we try our best to adapt to the levels of all the students in the class, but the baseline is uniformity.

The main way in which we adapt to different levels of students is to offer many different courses, so each student can jump in at his or her own level. But this still doesn’t account for different abilities, different amounts of time spent during the semester, and so forth.

Also, and relatedly, I don’t think the sequential model works so well within a course, even setting aside the differences between students. There is typically only a weak ordering of the different topics within a statistics course, and to really learn the material you have to keep going back and practicing what you’ve learned.

The sequential model works well in a textbook—it’s good to be able to find what you need, and see how it relates to the other material you’ll be learning. But in a course, I’m thinking we’d be better off moving toward the judo model, in which we have a bunch of classes with regular hours and students can drop in and practice, setting their schedules as they see fit. We could then assess progress using standardized tests instead of course grades.

P.S. I’m not an expert on judo, so please take the above description as approximate. This post is really about statistics teaching, not judo.

“A Headline That Will Make Global-Warming Activists Apoplectic”

I saw this article in the newspaper today, “2017 Was One of the Hottest Years on Record. And That Was Without El Niño,” subtitled, “The world in 2017 saw some of the highest average surface temperatures ever recorded, surprising scientists who had expected sharper retreat from recent record years,” and accompanied by the above graph, and this reminded me of something.

A few years ago there was a cottage industry among some contrarian journalists, making use of the fact that 1998 was a particularly hot year (by the standards of its period) to cast doubt on the global warming trend. Ummmm, where did I see this? . . . Here, I found it! It was a post by Stephen Dubner on the Freakonomics blog, entitled, “A Headline That Will Make Global-Warming Activists Apoplectic,” and continuing:

The BBC is responsible. The article, by the climate correspondent Paul Hudson, is called “What Happened to Global Warming?” Highlights:

For the last 11 years we have not observed any increase in global temperatures. And our climate models did not forecast it, even though man-made carbon dioxide, the gas thought to be responsible for warming our planet, has continued to rise. So what on Earth is going on?


According to research conducted by Professor Don Easterbrook from Western Washington University last November, the oceans and global temperatures are correlated. . . . Professor Easterbrook says: “The PDO cool mode has replaced the warm mode in the Pacific Ocean, virtually assuring us of about 30 years of global cooling.”

Let the shouting begin. Will Paul Hudson be drummed out of the circle of environmental journalists? Look what happened here, when Al Gore was challenged by a particularly feisty questioner at a conference of environmental journalists.

We have a chapter in SuperFreakonomics about global warming and it too will likely produce a lot of shouting, name-calling, and accusations ranging from idiocy to venality. It is curious that the global-warming arena is so rife with shrillness and ridicule. Where does this shrillness come from? . . .

No shrillness here. Professor Don Easterbrook from Western Washington University seems to have screwed up his calculations somewhere, but that happens. And Dubner did not make this claim himself; he merely featured a news article that featured this particular guy and treated him like an expert. Actually, Dubner and his co-author Levitt also wrote, “we believe that rising global temperatures are a man-made phenomenon and that global warming is an important issue to solve,” so I could never quite figure out in their blog he was highlighting an obscure scientist who was claiming that we were virtually assured of 30 years of cooling.

Anyway, we all make mistakes; what’s important is to learn from them. I hope Dubner and his Freaknomics colleagues learn from this particular prediction that went awry. Remember, back in 2009 when Dubner was writing about “A Headline That Will Make Global-Warming Activists Apoplectic,” and Don Easterbrook was “virtually assuring us of about 30 years of global cooling,” the actual climate-science experts were telling us that things would be getting hotter. The experts were pointing out that oft-repeated claims such as “For the last 11 years we have not observed any increase in global temperatures . . .” were pivoting off the single data point of 1998, but Dubner and Levitt didn’t want to hear it. Fiddling while the planet burns, one might say.

It’s not that the experts are always right, but it can make sense to listen to their reasoning instead of going on about apoplectic activists, feisty questioners, and shrillness.

Where that title came from

I could not think of a good title for this post. My first try was “An institutional model for the persistence of false belief, but I don’t think it’s helpful to describe scientific paradigms as ‘true’ or ‘false.’ Also, boo on cheap laughs at the expense of academia,” and later attempts were even worse. At one point I was using this self-referential piece of crap: “This title of this post is terrible, but at least it’s short.” That’s like Paul Auster on a really really bad day, it’s Raymond Smullyan without the cleverness, it’s just horrible.

Every once in awhile, I come up with a good title for a post (as you can see by scanning these and these). And some of my articles have good titles. But typically I struggle. On the positive side, I’m in good company. Updike was a poor titler too. Donald E. Westlake—that’s a guy who knew how to do it. In fact, hey! I’ll pick a title from that list of unused Westlake book titles. “The Trumpets of Lilliput” it is. Really too bad the man couldnt’ve lived another 50 years so he could’ve written all those books for us.

The funny thing is, I have no problems coming up with good lines. And that list doesn’t even include the classic, “Survey weighting is a mess.” Titling, though, that’s another thing entirely, a challenge all its own.

Stan short course in NYC in 2.5 weeks

To all who may be interested:

Jonah Gabry, Stan developer and creator of ShinyStan, will be giving a short course downtown, from 6-8 Aug. Details here.

Jonah has taught Stan courses before, and he knows what he’s doing.

“The idea of replication is central not just to scientific practice but also to formal statistics . . . Frequentist statistics relies on the reference set of repeated experiments, and Bayesian statistics relies on the prior distribution which represents the population of effects.”

Rolf Zwaan (who we last encountered here in “From zero to Ted talk in 18 simple steps”), Alexander Etz, Richard Lucas, and M. Brent Donnellan wrote an article, “Making replication mainstream,” which begins:

Many philosophers of science and methodologists have argued that the ability to repeat studies and obtain similar results is an essential component of science. . . . To address the need for an integrative summary, we review various types of replication studies and then discuss the most commonly voiced concerns about direct replication. We provide detailed responses to these concerns and consider different statistical ways to evaluate replications. We conclude there are no theoretical or statistical obstacles to making direct replication a routine aspect of psychological science.

The article was published in Behavioral and Brain Sciences, a journal that runs articles with many discussants (see here for an example from a few years back).

I wrote a discussion, “Don’t characterize replications as successes or failures”:

No replication is truly direct, and I recommend moving away from the classification of replications as “direct” or “conceptual” to a framework in which we accept that treatment effects vary across conditions. Relatedly, we should stop labeling replications as successes or failures and instead use continuous measures to compare different studies, again using meta-analysis of raw data where possible. . . .

I also agree that various concerns about the difficulty of replication should, in fact, be interpreted as arguments in favor of replication. For example, if effects can vary by context, this provides more reason why replication is necessary for scientific progress. . . .

It may well make sense to assign lower value to replications than to original studies, when considered as intellectual products, as we can assume the replication requires less creative effort. When considered as scientific evidence, however, the results from a replication can well be better than those of the original study, in that the replication can have more control in its design, measurement, and analysis. . . .

Beyond this, I would like to add two points from a statistician’s perspective.

First, the idea of replication is central not just to scientific practice but also to formal statistics, even though this has not always been recognized. Frequentist statistics relies on the reference set of repeated experiments, and Bayesian statistics relies on the prior distribution which represents the population of effects—and in the analysis of replication studies it is important for the model to allow effects to vary across scenarios.

My second point is that in the analysis of replication studies I recommend continuous analysis and multilevel modeling (meta-analysis), in contrast to the target article which recommends binary decision rules which which I think are contrary to the spirit of inquiry that motivates replication in the first place.

Jennifer Tackett and Blake McShane wrote a discussion, “Conceptualizing and evaluating replication across domains of behavioral research,” which begins:

We discuss the authors’ conceptualization of replication, in particular the false dichotomy of direct versus conceptual replication intrinsic to it, and suggest a broader one that better generalizes to other domains of psychological research. We also discuss their approach to the evaluation of replication results and suggest moving beyond their dichotomous statistical paradigms and employing hierarchical / meta-analytic statistical models.

Also relevant is this talk on Bayes, statistics, and reproducibility from earlier this year.

If you have a measure, it will be gamed (politics edition).

They sometimes call it Campbell’s Law:

New York Governor Andrew Cuomo is not exactly known for drumming up grassroots enthusiasm and small donor contributions, so it was quite a surprise on Monday when his reelection campaign reported that more than half of his campaign contributors this year gave $250 or less.

But wait—a closer examination of those donations reveals a very odd fact: 69 of them came from just one person, Christopher Kim.

Even odder, it appears Kim lives at the same address as one of Cuomo’s aides! . . .

1) Cuomo has testily fielded questions from reporters about his donor base and that of his primary opponent, Cynthia Nixon, who loves to needle him over his cozy relationships with rich donors, and who also, in March, told the Buffalo News, “In one day of fundraising I received more small donor [contributions] than Andrew Cuomo received in seven years.” 2) All at once, Cuomo’s campaign got an influx of small donations from someone who appears to share an address with a Cuomo aide. . . .

$1 donations, huh? What the campaign should really do is set up a set of booths where you can just drop a quarter in a slot to make your campaign donation. They could put them in laundromats . . . Hey—do laundromats still take quarters? It’s been a long time since I’ve been in one! Maybe, ummm, I dunno, an arcade?

“For professional baseball players, faster hand-eye coordination linked to batting performance”

Kevin Lewis sends along this press release reporting what may be the least surprising laboratory finding since the classic “Participants reported being hungrier when they walked into the café (mean = 7.38, SD = 2.20) than when they walked out [mean = 1.53, SD = 2.70, F(1, 75) = 107.68, P < 0.001]."

Data-based ways of getting a job

Bart Turczynski writes:

I read the following blog with a lot of excitement:

Then I reread it and paid attention to the graphs and models (which don’t seem to be actual models, but rather, well, lines.) The story makes sense, but the science part is questionable (or at least unclear.)

Perhaps you’d like to have a look? This isn’t as important as paying attention to data fraud in cancer research, but January is really busy in recruitment and people are always worried about how to best approach their job hunt.

He continues:

It’s really hard to find good statistics about hiring/recruiting. The BLS provides too general a picture for individual job seekers to make sense of. Industry studies are usually opinion polls. Some experiments are pretty much ads in disguise (the neologism used to describe this escapes me.) A prime example of this is what I call the “6 second rule” [not this one — ed.]:

Apparently, an eye-tracking experiment suggests that recruiters spend just a few seconds on a resume. Sure, they have to sieve through dozens if not hundreds of resumes, but an average of 6 seconds? Not sure.

My reply: I don’t know what to think about all this. I clicked on the first link above and read the post (The Science of The Job Search, Part I: 13 Data-Backed Ways To Win, by Kushal Chakrabarti). I understand your criticism about the vague science, and I agree that the causal claims are outta control, but the post had data and graphs, and that’s a great start already. I’m sure it could be done better but it seems like a useful start. One thing, though: I am bothered by some of the advice being zero-sum or even negative-sum.

The statistical checklist: Could there be a list of guidelines to help analysts do better work?

[image of cat with a checklist]

Paul Cuffe writes:

Your idea of “researcher degrees of freedom” [actually not my idea; the phrase comes from Simmons, Nelson, and Simonsohn] really resonates with me: I’m continually surprised by how many researchers freestyle their way through a statistical analysis, using whatever tests, and presenting whatever results, strikes their fancy. Do you think there’s scope for a “standard recipe,” at least as a starting point for engaging with an arbitrary list of numbers that might pop out of an experiment?

I’ll freely admit that I am very ignorant about inferential statistics, but as an outsider it seems that a paper’s methodology often gets critiqued based on how they navigated various pratfalls, e.g. some sage shows up and says “The authors forget to check if their errors are normally distributed, so therefore their use of such-and-such a test is inappropriate.” It’s well known that humans can really only keep 7±2 ideas in their working memory at a time, and it seems that the list of potential statistical missteps goes well beyond this (perhaps your “Handy statistical lexicon” is intended to help a bit with regard to working memory?) I’d just wonder if there’s a way to codify all relevant wisdom into a standard checklist or flowchart? So that inappropriate missteps are forbidden, and the analyst is guided down the proper path without much freedom. How much of the “abstract” knowledge of pure statisticians could be baked into such a procedure for practitioners?

Atul Gawande has written persuasively on how the humble checklist can help surgeons overcome the limits of working memory to substantially improve medical outcomes. Is there any scope for the same approach in applied data analysis?

My reply:

This reminds me of the discussion we had a few years ago [ulp! actually almost 10 years ago!] on “interventions” vs. “checklists” as two paradigms for improvement.

It would be tough, though. Just to illustrate on a couple of your points above:

– I think “freestyling your way across a statistical analysis” is not such a bad thing. It’s what I do! I do agree that it’s important to share all your data, though.

– Very few things in statistics depend on the distribution of the errors, and if someone tells you that your test is inappropriate because your error terms are normally distributed, my suggestion is to (a) ignore the criticism because, except for prediction, who cares about the error term, it’s the least important part of a regression model; and (b) stop doing hypothesis tests anyway!

But, OK, I’ll give some general advice:

1. What do practitioners need to know about regression?

2. See the advice on pages 639-640 of this article.

I hope that others can offer their checklist suggestions in the comments.

The “Carl Sagan effect”

Javier Benítez writes:

I am not in academia, but I have learned a lot about science from what’s available to the public. But I also didn’t know that public outreach is looked down upon by academia. See the Carl Sagan Effect.

Susana Martinez-Conde writes:

One scientist, who agreed to participate on the condition of anonymity—an indicator of his perceived vulnerability to the Sagan Effect—left his research institute as a junior faculty member because he felt that the institute’s director—who had chided him about communicating with the press—was blocking his advancement to associate professor after there had been extensive media coverage of his work. The same researcher, who has published in the highest-impact journals, said that he has been unable to get a grant after further recent media coverage and a giving a related lecture at a TED conference. He has declined an invitation to give a second TED talk in light of the criticism, and will not do further media interviews at present. The worst for me was the grants. Since this paper [covered extensively in major international media], all my grants got rejected with terrible comments. It was suddenly completely changed. I had 25 grants rejected since the paper in [name of top tier journal].

Has Contemporary Academia Outgrown the Carl Sagan Effect? Journal of Neuroscience 17 February 2016, 36 (7) 2077-2082; DOI:

I know there are bad TED talks out there and some of it may even be pseudoscience, but how can there be an informed public about science when outreach is discouraged?

“Pseudoscience is embraced, it might be argued, in exact proportion as real science is misunderstood – except that the language breaks down here. If you’ve never heard of science (to say nothing of how it works), you can hardly be aware you’re embracing pseudoscience.” Carl Sagan – The Demon-Haunted World (1996)

My reply:

I agree that one of the duties of academic research is service, and part of this can be discharged by communication to general audiences. On the plus side, if you can communicate to the general public, then you’re reaching more people who can uncover flaws in your ideas. So one of the benefits of public exposure is that you can get some valuable critiques from the outside.

Regarding the quote at the end: 25 grant applications seems like a lot. Who applies for 25 different grants??

Mister P wins again

Chad Kiewiet De Jonge, Gary Langer, and Sofi Sinozich write:

This paper presents state-level estimates of the 2016 presidential election using data from the ABC News/Washington Post tracking poll and multilevel regression with poststratification (MRP). While previous implementations of MRP for election forecasting have relied on data from prior elections to establish poststratification targets for the composition of the electorate, in this paper we estimate both turnout and vote preference from the same preelection poll. Through Bayesian estimation we are also able to capture uncertainty in both estimated turnout and vote preferences. This approach correctly predicts 50 of 51 contests, showing greater accuracy than comparison models that rely on the 2012 Current Population Survey Voting and Registration Supplement for turnout.

Cool. Also this:

While the model does not perfectly estimate turnout as a share of the voting age population, popular vote shares, or vote margins in each state, it is more accurate than predictions published by polling aggregators or other published MRP estimators.

And more:

The paper also reports how vote preferences changed over the course of the 18-day tracking period, compares subgroup-level estimates of turnout and vote preferences with the 2016 CPS Survey and National Election Pool exit poll, and summarizes the accuracy of the approach applied to the 2000, 2004, 2008, and 2012 elections.

Here are the headings of their results section:

Estimating Turnout from Pre-Election Polls Outperforms Models Based on Historical Data

MRP Based on Pre-Election Polling Anticipated Trump Victory; 2012 Turnout-Based Models Don’t

Model Estimates Suggest an Electorate Even More Polarized by Education than the Exit Poll

Clinton Consistently Led in the Popular Vote, but not in the Electoral Vote

MRP Outperforms Polling Aggregators in Accuracy

MRP Performs Fairly Well in Past Elections

They fit their models using Stan, as they explain in this footnote:

The course of science

Shravan Vasishth sends this along:

Yup. Not always, though. Even though the above behavior is rewarded.

What happens to your career when you have to retract a paper?

In response to our recent post on retractions, Josh Krieger sends along two papers he worked on with Pierre Azoulay, Jeff Furman, Fiona Murray, and Alessandro Bonatti. Krieger writes, “Both papers are about the spillover effects of retractions on other work. Turns out retractions are great for identification!”

Paper #1: “The career effects of scandal: Evidence from scientific retractions”

Paper #2: “Retractions”

I’ve not looked at these papers in detail but they should be of interest to some of you.

P.S. I’ve issued 4 corrections to published papers (go here and search on Correction). The errors were serious enough that 2 of these could’ve been retractions. My career’s still going ok, but to the extent it has been harmed by the corrections, that would be fair enough—after all, my career benefited from the earlier publication of these erroneous claims.

“Bayesian Meta-Analysis with Weakly Informative Prior Distributions”

Donny Williams sends along this paper, with Philippe Rast and Paul-Christian Bürkner, and writes:

This paper is similar to the Chung et al. avoiding boundary estimates papers (here and here), but we use fully Bayesian methods, and specifically the half-Cauchy prior. We show it has as good of performance as a fully informed prior based on tau values in psychology.

Further, we consider KL-divergence between estimated meta-analytic distribution and the “true” meta-analytic distribution. Here we show a striking advantage for the Bayesian models, which has never been shown in the context of meta-analysis.

Cool! I love to see our ideas making a difference.

And here’s the abstract to the Williams, Rast, and Bürkner paper:

Developing meta-analytic methods is an important goal for psychological science. When there are few studies in particular, commonly used methods have several limitations, most notably of which is underestimating between-study variability. Although Bayesian methods are often recommended for small sample situations, their performance has not been thoroughly examined in the context of meta-analysis. Here, we characterize and apply weakly-informative priors for estimating meta-analytic models and demonstrate with extensive simulations that fully Bayesian methods overcome boundary estimates of exactly zero between-study variance, be er maintain error rates, and have lower frequentist risk according to Kullback-Leibler divergence. While our results show that combining evidence with few studies is non-trivial, we argue that this is an important goal that deserves further consideration in psychology. Further, we suggest that frequentist properties can provide important information for Bayesian modeling. We conclude with meta-analytic guidelines for applied researchers that can be implemented with the provided computer code.

I completely agree with this remark: “frequentist properties can provide important information for Bayesian modeling.”

Where do I learn about log_sum_exp, log1p, lccdf, and other numerical analysis tricks?

Richard McElreath inquires:

I was helping a colleague recently fix his MATLAB code by using log_sum_exp and log1m tricks. The natural question he had was, “where do you learn this stuff?”

I checked Numerical Recipes, but the statistical parts are actually pretty thin (at least in my 1994 edition).

Do you know of any books/papers that describe these techniques?

I’d love to hear this blog’s answers to these questions.

I replied that I learned numerical analysis “on the street” through HMM implementations. HMMs are also a good introduction to the kind of dynamic programming technique I used for that Poisson-binomial implementation we discussed (which we’ll build into Stan one of these days—it’ll be a fun project for someone). Then I picked up the rest through a hodge-podge of case-based learning.

“Numerical analysis” is name of the field and the textbooks where you’ll learn log_sum_exp and log1p and complementary cdfs and learn how 0 is so very different than 1 (smallest double-precision floating point value greater than zero is around 10^-300, whereas the largest double-precision value less than 1 is about 1 – 10^-16), which is rather relevant for statistical computation. You’ll also learn about catastrophic cancellation (which makes naive variance calculations so unstable) and things like the stable Welford algorithm for calculating variance, which also has the nice property of behaving as a streaming accumulator (i.e., it’s memoryless). I don’t know which books are good, but there are lots of web sites and course materials you can try.

The more advanced versions of this will be about matrices and how to maintain stability of iterative algorithms. Things like pivoting LL^t decompositions and how to do stable matrix division. A lot of that’s also about how to deal with caching in memory with blocking algorithms to do this efficiently. A decent matrix multiplier will be more than an order of magnitude faster than a naive approach on large matrices.

“Algorithms and data structures” is the CS class where you learn about things like dynamic programming (e.g., how to calculate HMM likelihoods, fast Fourier transforms, and matrix multiplication ordering).

Algorithms class won’t typically get into the low-level caching and branch-point prediction stuff you need to know to build something like Stan efficiently. There, you need to start diving into the compiler literature and the generated assembly and machine code. I can highly recommend Agner Fogg’s overviews on C++ optimization—they’re free and cover most of what you need to know to start thinking about writing efficient C++ (or Fortran—the game’s a bit different with statically typed functional languages like ML).

The 1D integrator in Stan (probably land in 2.19—there’s a few kinks to work out in Ben Bales’s math lib code) uses an input that provides both the integrated value and its complement (x and closest boundary of the integration minus x). Ben Goodrich helped a lot, as usual, with these complicated numerical things. The result is an integrator with enough precision to integrate the beta distribution between 0 and 1 (the trick is the asymptote at 1).

Integration in general is another advanced numerical analysis field with tons of great references on error accumulation. Leimkuhler and Reich is the usual intro reference that’s specific to Hamiltonian systems; we use the leapfrog (Störmer-Verlet) integrator for NUTS and this book has a nice analysis. We’re looking now into some implicit algorithms to deal with “stiff” systems that cause relatively simple explicit algorithms like Runge-Kutta to require step sizes so small as to be impractical; we already offer them within Stan for dynamics modeling (the _bdf integrators). Hairer et al. the more mathematically advanced reference for integrators. There are tons of great course notes and applied mathematics books out there for implementing Euler, implicit Euler, Runge-Kutta, Adams-Moulton, implicit midpoint, etc., all of which have different error and symplecticness properties which heavily tie into implementing efficient Hamiltonian dynamics. Yi Zhang at Metrum is now working on improving our underlying algorithms and adding partial differential equation solvers. Now I have a whole new class of algorithms to learn.

So much for my getting away from Columbia after I “learned statistics”. I should at least record the half-lecture I do on this topic for Andrew’s stats communication class (the other half of the class I do focuses on wring API specs). I figure it’s communicating with the computer and communicating with users, but at least one student per year walks out in disgust at my stretching the topic so broadly to include this computer sciency stuff.

The persistence of bad reporting and the reluctance of people to criticize it

Mark Palko pointed to a bit of puff-piece journalism on the tech entrepreneur Elon Musk that was so extreme that it read as a possible parody, and I wrote, “it could just be as simple as that [author Neil] Strauss decided that a pure puff piece would give him access to write a future Musk bio.”

I then continued:

Here’s another angle on the whole Musk hype thing. Consider all the journalists and commentators out there who are not in the pay of Musk and do not harbor ambitions to write a book about the guy. Why don’t they go mock Neil Strauss for this article?

One reason, perhaps, is a mixture of vague hope of Musk dollars, mixed with vague fear of Musk dollars. Even if you’re not directly planning to get any Musk funding, and even if you’re not directly afraid that Musk would personally retaliate against you if you criticize him, still, it might seem “better safe than sorry” to just not bother to publicize any negative views you might have, regarding the Musk phenomenon. Not that Musk would pull a Peter Thiel and try to put you out of business—but what’s the point of tempting fate?

In addition to all that, you might feel that Musk is on the side of good. If you’re politically conservative, Musk represents all the good things of self-made businessmen; if you’re politically liberal, Musk represents a socially-conscious, zero-emissions future; or you just might think Musk is cool. So, sure, there’s some hype, you might think, but why go after Musk, who is such a force for good?

Is it good for Musk to have all this barely-contested hype? It’s hard to say. It’s got to be a loss to be able to dodge serious criticism—after all, the laws of physics will not be as gentle as the NY and LA press. On the other hand, it could be that all the hype could allow Musk to say afloat financially long enough for him to achieve whatever goals he has that are technically possible.

My point here is to go one step “meta” on the discussion, and ask, not just why is this journalist shilling for Musk, but also why are all the other journalists, bloggers, etc., not blowing this particular gaff?

A similar question could be asked about pro-Soviet hype in the 1960s, for example this notorious graph from Paul Samuelson’s famous textbook. We can ask not just, How did Samuelson get it so wrong?, but also, why did other members of the economics profession not criticize Samuelson more for this mistake? In this case, I doubt it was fear of the Russians, but it might well have been a mixture of (a) not wanting to slam Samuelson, who was, it seems, nearly universally beloved by his colleagues, (b) not wanting to reduce the credibility of economics more generally by pointing out an embarrassing flaw in the most famous textbook in the field, (c) not wanting to draw attention to leftist sympathies in academia, and (d) not wanting to draw attention to the economic failings of socialism. Items (c) and (d) are hardly secrets; still, maybe people felt no need to gratuitously remind people.

Should the points in this scatterplot be binned?

Someone writes:

Care to comment on this paper‘s Figure 4?

I found it a bit misleading to do scatter plots after averaging over multiple individuals. Most scatter plots could be “improved” this way to make things look much cleaner than they are.

People are already advertising the paper using this figure.

The article, Genetic analysis of social-class mobility in five longitudinal studies, by Daniel Belsky et al., is all about socioeconomic status based on some genetics-based score, and here’s the figure in question:

These graphs, representing data from four different surveys, plotting SES vs. gene score, with separate panels for family SES during childhood), show impressive correlations, but they’re actually graphs of binned averages. I agree with my correspondent that it would be better to show one dot per person here rather than each dot representing an average of 10 or 50 people. Binned residual plots can be useful in revealing systematic departures from a fitted model, but if you’re plotting the data it’s cleaner to plot the individual points, not averages. Plotting averages makes the correlation appear visually to be larger than it is.

My only concern is that the socioeconomic index (the y-axis on these graphs) might be discrete, in which case if you plot the raw data they will fall along horizontal bands, making the graph harder to interpret. You could then add jitter to the points, but then that’s introducing a new level of confusion.

So no easy answer, perhaps? Binned residuals misleadingly make the pattern look too clean, but raw data might be too discrete to plot.

Other than the concerns about binning, I think this graph is just great. Excellent use of small multiples, informative, clean, etc. A wonderful example of scientific communication.

BD reviews

I read BD’s (bandes dessinées or, as we say in English, graphic literature or picture storybooks) to keep up with my French. Regular books are too difficult for me. When it comes to BDs, some of the classic kids strips and albums are charming, but the ones for adults, which are more like Hollywood movies, are easier for me to read because I find the stories more compelling: I want to find out what happens next.

Here are brief reviews of some albums, in the order that I read them.

WW2.2, by David Chauvel and others. The first one I ever read! I bought Tome 1 at the train station in Brussels, then bought and read the others, one at a time. When I started reading, I had the impression that it was going to be an endless series in the vein of Lucky Luke. But then it turned out it was a finite set of 7 volumes. Since then, I’ve learned that a fixed-length plan is common practice, equivalent to a TV mini-series, I guess. Anyway, the 7 volumes of WW2.2 were of uneven quality but they were all pretty good, and the scenario as a whole made sense to me. My favorite was the first volume, where you get to know all these different characters, keeping them all straight in your head—and then all but one of them dies. Which makes the point of lethality of war more effective than any number of images of dismembered bodies.

Il était une fois En France, by Fabien Nury and Sylvain Vallée. Lived up to the hype. Without a doubt the best piece of literature, of any form, about a scrap metal dealer. I can’t recommend this one enough.

Gung Ho, by Benjamin Von Eckartsberg et Thomas Von Kummant. Fun post-apocalyptic adventure. I happen to have read most of Tome 1 on the beach, which somehow fixed it all in my mind. We’re now waiting for Tome 4 to come out.

Les promeneurs du temps, by Franck Viale et Sylvain Dorange. Fun story, excellent cartoony drawing style. I really loved Tome 1, but the story got so confusing that I lost touch somewhere in Tome 3. Too bad. I guess I’m not the only one who felt that way, because Tome 4 never appeared.

Tyler Cross, by Fabien Nury and Brüno. I saw this in the bookstore and it was intriguing. A Western—almost, I guess not quite as it takes place in the mid-twentieth century. I guess they’d call it a polar. The title character is reminiscent of Donald Westlake’s Parker. An open-ended series, two volumes so far with at least one more to come.

Souvenirs de l’empire de l’atome, by Thierry Smolderen et Alexandre Clérisse. I picked up this one on the strength of its drawings alone. Actually, that’s usually how I usually do it. Some drawings have character, some don’t. The story to this one was ok but didn’t quite follow through. I don’t really care, though, as the art was so distinctive. A real “60’s” feel.

Le temps perdu, by Rodolphe et Vink. Beautiful drawings, but ultimately the story was just too empty and sentimental so it didn’t really work as a BD.

Où sont passées les grands jours, by Jim and Alex Tefenkgi. Affecting, well-drawn story about the lives of some young adults. “Tout roule. Ne t’inquiète pas.”

Ceux qui me restent, by Damien Marie and Laurent Bonneau. Another one along the same lines: evocative, understated drawings and a realistic story that made me cry, this time about family and memory. The design of this one makes brilliant, spare use of colors in a way that perfectly matches the themes of the story.

Rouge comme la neige, by Christian De Metter. Sad, and beautiful. I don’t know why Westerns are such a popular form of BD, but this one played it straight and was heartbreaking.

Quai d’Orsay, by Christophe Blain et Abel Lanzac. Great drawing style. The story is funny, but my language skills are weak, so it takes pretty much all my effort to detect the humor, leaving me with little energy left to actually appreciate it. Still, I’m working my way through it. The book does not insult my intelligence.

Lancaster, by Christophe Bec and Jean-Jacques Dzialowski. A fun James Bond-style confection, just delicious. I read somewhere on the internet that it didn’t sell well so they decided not to continue it after the first 2 volumes. Too bad.

L’Arabe du futur, by Riad Sattouf. Wow. The guy is brilliant: inspired drawings and a wonderful story. Amazing presentation of a kid’s perspective and of violent societies. I wonder how people from Syria feel about this book: I could imagine them loving it, or I could imagine it getting them very angry. Tome 4 is coming soon. It’s just amazing how much facial expression Sattouf can capture in just a couple of lines.

I was motivated then to read other Sattouf books, including No Sex in New York (which is actually in French despite the title) and Les cahiers d’Esther. These are good too. No Sex in New York includes a hilarious cartoon of a lecherous Isaac Asimov.

Les vieux fourneaux, by Wilfred Lupano and Paul Cauuet. Wrinkly, still energetic soixante-huitards. Ni yeux, ni maître! 4 tomes so far. Lots of fun, takes a lot of work to follow. I think I’m catching about half the jokes.

Transperceneige, by Jacques Lob, Benjamin Legrand, and Jean-Marc Rochette. I read a few pages of this one and then paused, discouraged by a native speaker who said that this book is full of invented slang and it will be really hard for me to understand.

La mort de Staline, by Fabien Nury and Thierry Robin. Hilarious. Sad, too, but hilarious.

Mort au Tsar, by Fabien Nury and Thierry Robin. More of the same. Also high quality, but harder for me to follow as I didn’t know the story ahead of time.

L’été diabolik, by Thierry Smolderen et Alexandre Clérisse. A followup to Souvenirs de l’empire de l’atome, also with this great angular drawing style but this time with a better story, somewhat gimmicky but it worked for me.

L’homme qui ne disait jamais non, by Olivier Balez et Didier Tronchet. Lively drawing style and fun adventure. But when it was all over, I was disappointed because the plot was a bit of a cheat.

Stern, by Frédéric and Julien Maffre. The guy’s a gravedigger. This one’s more of a standard BD Western, tongue in cheek all the way through. Lots of fun, I liked it. I encountered it in the bookstore display one day. We’ve read Tomes 1 and 2; I assume more will be coming.

Junk, by Nicolas Pothier and Brüno. Le même dessinateur de Tyler Cross. What a great style. Good story, too. Another Western.

Katanga, by Fabien Nury and Sylvain Vallée: The team behind Il était une fois en France. This one’s good too, but a bit grimmer. A lot grimmer. This book has no good guys at all!

L’Imparfait du futur and La réplique inattendue, by Émile Bravo. These are the first two of a six-volume series. Science-fiction comedy; it really is funny and the sci-fi works too. This one is written for kids, but I’m including it on this list because this adult enjoys it. I’m looking forward to reading tomes 4-6.

Exercise and weight loss: long-term follow-up

This post is by Phil Price, not Andrew.

Waaaay back in 2010, I wrote a blog entry entitled “Exercise and Weight Loss.” I had added high-intensity interval training back into my exercise regime, and had lost 12 pounds in about 12 weeks; but around the same time, some highly publicized studies were released that claimed that exercise does not lead to weight loss in overweight people. I suggested that that claim was too strong: at best they had demonstrated that moderate-intensity exercise does not lead to weight loss in most overweight people. I am completely convinced that when I am slightly overweight, I lose weight when I do occasional high-intensity workouts.

Well, I’m back with another data point. After spending a month in hell earlier this year, during which I got no exercise, I had not only failed to lose my winter weight but had added a few pounds. When I was finally able to get to my usual spring activities, which include road biking — sometimes with high-intensity intervals — I quickly lost a couple of pounds. But then I crashed, nothing serious but enough to keep me off the bike and mostly sedentary for more than a month, and I put on some more weight, topping out at about 203 or 204 pounds, the heaviest I had been since I wrote that “Exercise and weight loss” blog post back in 2010. Already this experience would seem to contradict the suggestion that exercise doesn’t control weight: if I wasn’t gaining weight due to lack of exercise, why was I gaining it?

I was able to resume exercise in early May, and in the next six weeks I lost about six pounds. In the past few weeks I’ve lost a few more. Yesterday and today, I’ve weighed in at 193 pounds, ten pounds lighter than I was two months ago. Given past experience, I expect to remain somewhere in the 192- to 195-pound range until November, when I will start edging upwards.

So I’m reiterating the point of that eight-year-old blog post I mentioned at the top: maybe moderate-intensity exercise doesn’t lead to weight loss in most overweight people, but high-intensity exercise does lead to weight loss in me when I am somewhat overweight, and as long as I regularly do some high-intensity exercise I don’t tend to gain weight.

The broader point here is that I think researchers (and journalists) tend to over-generalize. If you do a test that subjects one group of people to one set of conditions, don’t assume the results will extend to a different set of people and/or a different set of conditions, even if the people and the conditions have some similarity to those used in the experiment. The differences can matter.

Reminder: this post is by Phil, not Andrew.

He wants to model a proportion given some predictors that sum to 1

Joël Gombin writes:

I’m wondering what your take would be on the following problem. I’d like to model a proportion (e.g., the share of the vote for a given party at some territorial level) in function of some compositional data (e.g., the sociodemographic makeup of the voting population), and this, in a multilevel fashion (allowing the parameters to vary across space – or time, for that matter).

Now, I know I should use a beta distribution to model the response. What I’m less certain of, is how I should deal with the compositional data that makes the predictors. Indeed, if I try to use them in a naive model, they don’t satisfy the independence hypothesis, create multicollinearity and are difficult to interpret. The marginal effects don’t make sense since whenever one variable goes up, other should go down (and at any rate can’t be considered as constant).

Until now my strategy has been to remove one of the predictors or to remove the constant. But obviously this is not satisfactory, and one can do better. Another strategy I’ve used, and I think it was inspired by you, was to center-scale the predictors. I wonder what you think about that.

However I’m sure one can do better than that. I’ve read about different strategies, for example using principal components or log ratios, but what bothers me with this kind of solution is that I find it very difficult to interpret the parameters when they are transformed this way.

So I wondered what startegy you would use in such a case.

My reply:

First off, no, there’s no reason you “should use a beta distribution to model the response.” I know that we often operate this way in practice—using the range of the data space to determine a probability model—but that’s just a convenience thing, certainly not a “should”! For modeling vote shares, you can sometimes get away with a simple additive model, if the proportions are never or rarely close to 0 or 1; another option is a linear model of logistic-transformed vote shares. In any case, if you have zeroes (candidates getting no votes or essentially no votes, or parties not running a candidate in some races), you’ll want to model that separately.

Now for the predictors. The best way to set up your regression model depends on how you’d like to model y given x.

It’s not necessarily wrong to simply fit a linear regression of y on the components of x. For example, suppose x is x1, x2, x3, x4, with the constraint x1 + x2 + x3 + x4 = 1. It could be that the model y = b1*x1 + b2*x2 + b3*x3 + b4*x4 + error will be reasonable. Indeed, there is no “independence hypothesis” that regression coefficients are expected to satisfy. You can just put in the 4 predictors and remove the constant term (unnecessary since they sum to 1) and go from there.

That said, your model might be more interpretable if you reparameterize. How best to do this will depend on context.