This should keep youall busy for awhile.
Frontiers of Science is a course offered as part of Columbia University’s Core Curriculum. The course is controversial, with some people praising its overview of several areas of science, and others feeling that a more traditional set of introductory science courses would do the job better.
Last month, the faculty in charge of the course wrote the following public letter:
Continue reading ‘Evaluating Columbia University’s Frontiers of Science course’ »
What happened that the journal Psychological Science published a paper with no identifiable strengths?
The other day we discussed that paper on ovulation and voting (you may recall that the authors reported a scattered bunch of comparisons, significance tests, and p-values, and I recommended that they would’ve done better to simply report complete summaries of their data, so that readers could see the comparisons of interest in full context), and I was thinking a bit more about why I was so bothered that it was published in Psychological Science, which I’d thought of as a serious research journal.
Continue reading ‘What happened that the journal Psychological Science published a paper with no identifiable strengths?’ »
This isn’t quite right—poetry, too, can be in paragraph form (see Auden, for example, or Frost, or lots of other examples)—but Basbøll is on to something here.
I’m reminded of Nicholson Baker’s hilarious “From the Index of First Lines,” which is truly the poetic counterpart to Basbøll’s argument in prose:
Hamdan Azhar writes:
I came across this graphic of vaccine-attributed decreases in mortality and was curious if you found it as unattractive and unintuitive as I did. Hope all is well with you!
My reply: All’s well with me. And yes, that’s one horrible graph. It has all the problems with a bad infographic with none of the virtues. Compared to this monstrosity, the typical USA Today graph is a stunning, beautiful masterpiece. I don’t think I want to soil this webpage with the image. In fact, I don’t even want to link to it.
Lee Sechrest sends along this article by Brian Haig and writes that it “presents what seems to me a useful perspective on much of what scientists/statisticians do and how science works, at least in the fields in which I work.” Here’s Haig’s abstract:
Continue reading ‘Where do theories come from?’ »
I received two emails yesterday on related topics.
First, Stephen Olivier pointed me to this post by Daniel Lakens, who wrote the following open call to statisticians:
You would think that if you are passionate about statistics, then you want to help people to calculate them correctly in any way you can. . . . you’d think some statisticians would be interested in helping a poor mathematically challenged psychologist out by offering some practical advice.
I’m the right person to ask this question, since I actually have written a lot of material that helps psychologists (and others) with their data analysis. But there clearly are communication difficulties, in that my work and that of other statisticians hasn’t reached Lakens. Sometimes the contributions of statisticians are made indirectly. For example, I wrote Bayesian Data Analysis, and then Kruschke wrote Doing Bayesian Data Analysis. Our statistics book made it possible for Kruschke to write his excellent book for psychologists. This is a reasonable division of labor.
That said, I’d like to do even more. So I will make some specific suggestions for data analysis in psychology right here in this post, in the context of my next story:
Continue reading ‘How can statisticians help psychologists do their research better?’ »
I was asked to write an article for the Committee of Presidents of Statistical Societies (COPSS) 50th anniversary volume. Here it is (it’s labeled as “Chapter 1,” which isn’t right; that’s just what came out when I used the template that was supplied). The article begins as follows:
The field of statistics continues to be divided into competing schools of thought. In theory one might imagine choosing the uniquely best method for each problem as it arises, but in practice we choose for ourselves (and recom- mend to others) default principles, models, and methods to be used in a wide variety of settings. This article briefly considers the informal criteria we use to decide what methods to use and what principles to apply in statistics problems.
And then I follow up with these sections:
Statistics: the science of defaults
Ways of knowing
The pluralist’s dilemma
And here’s the concluding paragraph:
Statistics is a young science in which progress is being made in many areas. Some methods in common use are many decades or even centuries old, but recent and current developments in nonparametric modeling, regularization, and multivariate analysis are central to state-of-the-art practice in many areas of applied statistics, ranging from psychometrics to genetics to predictive modeling in business and social science. Practitioners have a wide variety of statistical approaches to choose from, and researchers have many potential directions to study. A casual and introspective review suggests that there are many different criteria we use to decide that a statistical method is worthy of routine use. Those of us who lean on particular ways of knowing (which might include: performance on benchmark problems, success in new applications, insight into toy problems, optimality as shown by simulation studies or mathematical proofs, or success in the marketplace) should remain aware of the relevance of all these dimensions in the spread of default procedures.
Regular blog readers will recognize many of these themes, but I hope this particular presentation has some added value. And this is as good a place as any to thank my many correspondents who’ve helped contribute to the development and expression of these ideas.
Several months ago, Mike Betancourt and I wrote a discussion for the article, Can quantum probability provide a new direction for cognitive modeling?, by Emmanuel Pothos and Jerome Busemeyer, in Behavioral and Brain Sciences. We didn’t say much, but it was a milestone for me because, with this article, BBS became the 100th journal I’d published in.
What surprised me, in reading the full discussion, was how supportive the commentary was. Given the topic of Pothos and Busemeyer’s article, I was expecting the discussions to range from gentle mockery to outright abuse. The discussion that Mike and I wrote was moderately encouraging, and I was expecting this to fall on the extreme positive end of the spectrum.
Actually, though, most of the discussions were positive, and only a couple were purely negative (those would be “Quantum models of cognition as Orwellian newspeak” by Michael Lee and Wolf Vanpaemel, and “Physics envy: Trying to fit a square peg into a round hole,” by James Shenteau and David Weiss). We expressed some vague skepticism but it’s hard for me to be really negative about the idea, given that classical probability theory is not actually correct, and we do indeed live in a quantum world (otherwise all our tables and chairs would fall apart, for one thing). I certainly see no logical reason why our models of probability and uncertainty should be restricted to the “Boltzmannian” simplification.
David Kessler, Peter Hoff, and David Dunson write:
Marginally specified priors for nonparametric Bayesian estimation
Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may have real information about functionals of the parameter, such the population mean or variance. This article proposes a new framework for nonparametric Bayes inference in which the prior distribution for a possibly infinite-dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a nonparametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard nonparametric prior distributions in common use, and inherit the large support of the standard priors upon which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard nonparametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modeling of high-dimensional sparse contingency tables.
This seems very important to me, also I love the idea of Hoff and Dunson on the same paper, sort of like one of those 70′s supergroups.
I think it’s part of my duty as a blogger to intersperse, along with the steady flow of jokes, rants, and literary criticism, some material that will actually be useful to you.
So here goes.
Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari write:
The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods.
We can actually now fit Gaussian processes in Stan. But for big problems (or even moderately-sized problems), full Bayes can be slow. GPstuff uses EP, which is faster. At some point we’d like to implement EP in Stan. (Right now we’re working with Dave Blei to implement VB.)
GPstuff really works. I saw Aki use it to fit a nonparametric version of the Bangladesh well-switching example in ARM. He was sitting in his office and just whipped up the model and fit it.
Guy Freeman writes:
I thought you’d all like to know that Stan was used and referenced in a peer-reviewed Rapid Communications paper on influenza. Thank you for this excellent modelling language and sampler, which made it possible to carry out this work quickly!
I haven’t actually read the paper, but I’m happy to see Stan getting around like that.
David Jinkins writes:
The objective of this paper is to measure the relative importance of conspicous consumption to Americans and Chinese. To this end, I estimate the parameters of a utility function borrowed from recent theoretical work using American and Chinese data. The main parameter of interest governs the amount that individuals care about peer group beliefs regarding their welfare. Using survey data on the visibility of different good categories along with household budget surveys, I find that Chinese consumers care twice as much as American consumers about the beliefs of their peer group.
I came across this draft research manuscript by following the links back after Jinkins commented on our blog. The framing of the paper is a bit more foundation-y and a bit less statistic-y than I’d prefer, but I guess that’s just the way they do things in economics, compared to statistics or (some) political science. In any case, I wanted to point you to this paper, partly to let you know that I do read the blog comments and even sometimes follow the links, and also because it’s unusually well-written, not just in its first paragraph but all the way through. He’s got to work a bit on his presentation of results—I see some ugly tables there—but I think that’s much easier to learn than it is to learn how to write.
Miguel Paz writes:
Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations.
We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows.
If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard criteria for open data. Once the application is approved, you will receive an account to start running and managing open data, becoming part of the community.
Just as almost all science fiction is ultimately about politics, one could say that just about all crime fiction is about economics.
But if I had to pick one crime novelist with an economics focus, I’d pick George V. Higgins. In one of his novels, his character Jerry Kennedy had a riff on the difference between guys who get a salary and guys who have to work for every dollar. But, really, almost all his novels are full of economics.
Tom Salvesen asks, is this the worst info-graphic of the year?
I say, no. Nobody really cares about these numbers. It’s an amusing feature. The alternative would not be a better display of these data, the alternative would be some photo or cartoon. They’re just having fun. I wouldn’t give it any design awards but it’s fine, it is what it is.
Dave Berri posted the following at the Freakonomics blog:
The “best” picture of 2012 was Argo. At least that’s the film that won the Oscar for best picture. According to the Oscars, the decision to give this award to Argo was made by the nearly 6,000 voting members of the Academy of Motion Picture Arts and Sciences. . . . In other words, this choice is made by the “experts.” There is, though, another group that we could have listened to on Sunday night. That group would be the people who actually spend money to go to the movies. . . . According to that group, Marvel’s the Avengers was the “best” picture in 2012. With domestic revenues in excess of $600 million, this filmed earned nearly $200 million more than any other picture. And when we look at world-wide revenues, this film brought in more than $1.5 billion. . . . Despite what seems like a clear endorsement by the customers of this industry, the Avengers was ignored by the Oscars. Perhaps this is just because I am an economist, but this strikes me as odd. Movies are not a product made just for the members the academy. These ventures are primarily made for the general public. And yet, when it comes time to decide which picture is “best,” the opinion of the general public seems to be ignored. Essentially the Oscars are an industry statement to their customers that says: “We don’t think our customers are smart enough to tell us which of our products are good. So we created a ceremony to correct our customers.”
He keeps going along those lines for awhile and concludes:
One would hope the Academy would at least pay a bit more attention to the people paying the bills. Not only does it seem wrong (at least to this economist) to argue that movies many people like are simply not that good, focusing on the box office would seem to make good financial sense for the Oscars as well. A recent Slate article argued that the Oscars’ telecast tends to have higher ratings when more commercially successful films are nominated for best picture. So in the future, maybe voters for the Oscars will pay a bit more attention to their customers. These customers may not be thought of as “movie experts.” But these are the people who pay the bills, and therefore, ultimately it is their opinion that should matter to this industry.
What strikes me about this discussion is the mix of descriptive and normative that seems so characteristic of pop-microeconomics. (I should emphasize here that I’m not using “pop” in any sort of derogatory way. I’m speaking of serious economic writing that is intended for a popular audience.)
1. On one hand, you have the purely descriptive perspective: economist as person-from-Mars, looking at human society objectively, the way a scientist studies cell cultures in a test tube. Consumer sovereignty is what it’s all about, with a slightly offended tone that anyone could think otherwise. Who are you, smartypants, to think you know better than the average ticket-buyer, etc. I’m reminded of the perhaps-apocryphal story of the “some academics” who “conclude that bookmakers simply aren’t very smart.”
2. At the same time, we’re given a moral lesson. The Avengers is the best movie because it made more money. It is “the people who pay the bills” whose “opinion that should matter to this industry.”
The difficulty, of course, is that lesson 2 gets blurred if it is folded into lesson 1.
Berri’s argument is that moviemakers should not be paternalistically ignoring the attitudes of their customers in giving awards. But this argument dissolves if you take one step back and consider moviemakers as independent business operators. In that case, their business decisions (to do the Oscars however they want) should be given as much respect as that of moviegoers to choose which movies to watch.
As far as I’m concerned, the Academy can do whatever they want. What’s interesting to me here is to see how the economist’s explicitly non-normative ideology (his implication that the “best” picture must be the one with most revenue, and that any other criteria would be disrespectful of moviegoers) so quickly becomes normative (that it’s “wrong . . . to argue that movies many people like are simply not that good”). To me, it’s a strange mixture of idealism and cynicism. The man from Mars has become a scold.
In an email I sent to a colleague who’s writing about lasso and Bayesian regression for R users:
The one thing you might want to add, to fit with your pragmatic perspective, is to point out that these different methods are optimal under different assumptions about the data. However, these assumptions are never true (even in the rare cases where you have a believable prior, it won’t really follow the functional form assumed by bayesglm; even in the rare cases where you have a real loss function, it won’t really follow the mathematical form assumed by lasso etc), but these methods can still be useful and be given the interpretation of regularized estimates.
Another thing that someone might naively think is that regularization is fine but “unbiased” is somehow the most honest. In practice, if you stick to “unbiased” methods such as least squares, you’ll restrict the number of variables you can include in your model. So in reality you suffer from omitted-variable bias. So there is not safe home base. It’s not like the user can simply do unregularized regression and then think of regularization as a frill. The practitioner who uses unregularized regression has already essentially made a compromise with the devil by restricting the number of predictors in the model to a “manageable” level (whatever that means).
Over the years I’ve written a dozen or so journal articles that have appeared with discussions, and I’ve participated in many published discussions of others’ articles as well. I get a lot out of these article-discussion-rejoinder packages, in all three of my roles as reader, writer, and discussant.
Part 1: The story of an unsuccessful discussion
The first time I had a discussion article was the result of an unfortunate circumstance. I had a research idea that resulted in an article with Don Rubin on monitoring the mixing of Markov chain simulations. I new the idea was great, but back then we worked pretty slowly so it was awhile before we had a final version to submit to a journal. (In retrospect I wish I’d just submitted the draft version as it was.) In the meantime I presented the paper at a conference. Our idea was very well received (I had a sheet of paper so people could write their names and addresses to get preprints, and we got either 50 or 150 (I can’t remember which, I guess it must have been 50) requests), but there was one person who came up later and tried to shoot down our idea. The shooter-down, Charlie Geyer, has done some great work but in this case he was confused, I think in retrospect because we did not have a clear discussion of the different inferential goals that arose in the sorts of calculations he was doing (inference for normalizing constants of distributions) and which I was doing (inference for parameters in fitted models). In any case, the result was that our new and exciting method was surrounded by an air of controversy. In some ways that was a good thing: I became well known in the field right away, perhaps more than I deserved at the time (in the sense that most of my papers up to then and for the next few years were on applied topics; it was awhile before I published other major papers on statistical theory, methods, and computation). But overall I’d rather have been moderately known for an excellent piece of research than very well known for being part of a controversy. I didn’t seek out controversy; it arose because someone else criticized our work without seeing the big picture, and at the time neither he nor I nor my collaborator had the correct synthesis of my work and his criticism.
(Again, the synthesis was that he was trying to get precise answers for hard problems and was in a position where he needed to have a good understanding of the complex distributions he was simulating from, whereas I was working on a method to apply routinely in relatively easy (but nontrivial!) settings. For Charlie’s problems, my method would not suffice because he wouldn’t be satisfied until he was directly convinced that the Markov chain was exploring all the space. For my problems, Charlie’s approach (to run a million simulations and work really hard to understand the computation for a particular model) wasn’t a practical solution. His approach to applied statistics was to handcraft big battleships to solve large problems, one at a time. I wanted to fit lots of small and medium-sized models (along with the occasional big one), fast.)
Anyway, this “different methods for different goals” conversation never occurred, hence I left that meeting with an unpleasant feeling that our method was controversial, not fully accepted, and not fully understood. So I got it into my head that our article should be published as a discussion, so that Geyer and others could comment and we could respond.
But we never had that discussion, not in those words. Neither Charlie nor I nor Don Rubin was aware enough of the sociological context, as it were, so we ended up talking past each other.
In retrospect, that particular discussion did not work so well.
Here’s another example from about the same time, the Ising model. Here’s one chain from the Gibbs sampler. After 2000 iterations, it looks like it’s settled down to convergence (here we’re plotting the log probability density, which is a commonly used summary for this sort of distribution).
But then look at the second plot: the first 500 iterations. If we’d only seen these, we might have been tempted to declare victory too early!
At this point, the naive take-home point might be that 500 iterations was not enough but we’re safe with 2000. But no! Even that the last bit of those 2000 looks as stationary and clean as can be, if we start from a different point and run for 2000, we get something different:
This one looks stationary too! But a careful comparison with the graphs above (even clearer when I displayed these on transparency sheets and overlaid them on the projector) reveals that the two “stationary” distributions are different. The chains haven’t mixed, the process hasn’t converged. R-hat reveals this right away (without even having to look at the graphs, but you can look at the graphs if you want).
As I wrote in our article in Bayesian Statistics 4,
This example shows that the Gibbs sampler can stay in a small subset of its space for a long time, without any evidence of this problematic behavior being provided by one simulated series of finite length. The simplest way to run into trouble is with a two-chambered space, in which the probability of switching chambers is very low, but the above graphs are especially disturbing because the probability density in the Ising model has a unimodal (in the sense that this means anything in a discrete distribution) and approximately Gaussian marginal distribution on the gross scale of interest. That is, the example is not pathological; the Gibbs sampler is just very slow. Rather than being a worst-case example, the Ising model is typical of the probability distributions for which iterative simulation methods were designed, and may be typical of many posterior distributions to which the Gibbs sampler is being applied.
So that was my perspective: start from one point and the chain looks fine; start from two points and you see the problem. But Charlie had a different attitude toward the Ising example. His take on it was: the Ising model is known to be difficult, no one but a fool would try to simulate it with 2000 iterations of a Gibbs sampler. There’s a huge literature on the Ising model already!
Charlie was interested in methods for solving large, well-understood problems one at a time. I was interested in methods that would be used for all sorts of problems by statisticians such as myself who, for applied reasons, bite off more in model than we can chew in computation and understanding. For Charlie with the Ising model, multiple sequences missed the point entirely, as he knew already that 2000 iterations of Gibbs wouldn’t do it. For me, though . . . as an applied guy I was just the kind of knucklehead who might apply Gibbs to this sort of problem (in my defense, Geman and Geman made a similar mistake in 1984, I’ve been told), so it was good to have a practical convergence check.
Again, I think that in our discussion and rejoinder, Don and I presented our method well, in the context of our applied purposes. But I think it would’ve worked better as a straight statistics article. Nothing much useful came out of the discussion because none of us cut through to the key difference in the sorts of problems we were working on.
Part 2: A successful discussion
In the years since then, I’ve realized that communication is more than being right (or, should I say, thinking that one is right). Statistical ideas (and, for that matter, mathematical and scientific ideas in general) are sometimes best understood through their limitations. It’s Lakatos’s old “proofs and refutations” story all over again.
Recently I was involved in a discussion that worked out well. It started a few years ago with a post of mine on the differences between the sorts of data visualizations that go viral on the web (using some examples that were celebrated by statistician/designer Nathan Yau), as compared to statistical graphics of the sort that we are trained to make. It seemed to me that many visualizations that are successful with general audiences feature unique or striking designs and puzzle-like configurations, whereas the most successful statistical graphics have more transparent formats that foreground data comparisons. Somewhere in between are the visualizations created by lab scientists, who generally try to follow statistical principles but usually (in my view) try too hard to display too much information on a single plot.
My posts, and various follow-ups, were disliked by many in the visualization community. They didn’t ever quite disagree with my claim that many successful visualizations involve puzzles, but they didn’t like what they perceived as my negative tone.
In attempt to engage the fields of statistics and visualization more directly, I wrote an article (with Antony Unwin) on the different goals and different looks of these two sorts of graphics. Like many of my favorite papers, this one took a lot of effort to get into a journal. But finally it was accepted in the Journal of Computational and Graphical Statistics, with discussion.
The discussants (Stephen Few, Robert Kosara, Paul Murrell, and Hadley Wickham; links to all four discussions are here on Kosara’s blog) politely agreed with us on some points and disagreed with us on others. And then it was time for us to write our rejoinder.
In composing the rejoinder I finally came upon a good framing of the problem. Before we’d spoken of statistical graphs and information visualization as having different goals and looking different. But that didn’t work. No matter how often I said that it could be a good thing that an infovis is puzzle-like, or no matter how often I said that as a statistician I would prefer graphing the data like This but I can understand how graphing it like That could attract more viewers . . . no matter how much I said this sort of thing, it was interpreted as a value judgment (and it didn’t help when I said that something “sucked,” even if I later modified that statement).
Anyway, my new framing, that I really like, is in terms of tradeoffs. Not “two cultures,” not “different goals, different looks,” but tradeoffs. So it’s not stat versus infographics; instead it’s any of us trying to construct a graph (or, better still, a grid of graphs) and recognizing that it’s not generally possible to satisfy all goals at once, so we have to think about what goals are most important in any given situation:
In the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful.
Yes, it can sometimes be possible for a graph to be both beautiful and informative, as in Minard’s famous Napoleon-in-Russia map, or more recently the Baby Name Wizard, which we featured in our article. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts.
So it’s not Us versus Them, it’s each of us choosing a different point along the efficient frontier for each problem we care about.
And I think the framing worked well. At least, it helped us communicate with Robert Kosara, one of our discussants. Here’s what Kosara wrote, after seeing our article, the discussions (including his), and our rejoinder:
There are many, many statements in that article [by Gelman and Unwin] that just ask to be debunked . . . I [Kosara] ended up writing a short response that mostly points to the big picture of what InfoVis really is, and that gives some examples of the many things they missed.
While the original article is rather infuriating, the rejoinder is a great example of why this kind of conversation is so valuable. Gelman and Unwin respond very thoughtfully to the comments, seem to have a much more accurate view of information visualization than they used to, and make some good points in response.
Great! A discussion that worked! This is how it’s supposed to go: not a point-scoring debate, not people talking past each other, but an honest and open discussion.
Perhaps my extremely, extremely frustrating experience early in my career (detailed in Part 1 above) motivated me to think seriously about the Lakatosian attitude toward understanding and explaining ideas. If you compare Bayesian Data Analysis to other statistics books of that era, for example, I think we did a pretty good job (although maybe not good enough) of understanding the methods through their limitations. But even with all my experience and all my efforts, this can be difficult, as revealed by the years it took for us to finally process our ideas on graphics and visualization to the extent that we could communicate with experts in these fields.
Gary Marcus writes,
An algorithm that is good at chess won’t help parsing sentences, and one that parses sentences likely won’t be much help playing chess.
That is soooo true. I’m excellent at parsing sentences but I’m not so great at chess. And, worse than that, my chess ability seems to be declining from year to year.
Which reminds me: I recently read Frank Brady’s much lauded Endgame, a biography of Bobby Fischer. The first few chapters were great, not just the Cinderella story of his steps to the world championship, but also the background on his childhood and the stories of the games and tournaments that he lost along the way.
But after Fischer beats Spassky in 1972, the book just dies. Brady has chapter after chapter on Fisher’s life, his paranoia, his girlfriends, his travels. But, really, after the chess is over, it’s just sad and kind of boring. I’d much rather have had twice as much detail on the first part of the life and then had the post-1972 era compressed into a single chapter. I mean, sure, I respect that Brady wanted to tell the full life story, and I’m not telling him how he should’ve written his book, I’m just giving my reactions.
Also, I would’ve liked more information on the games: what was the amazing set of moves that Fischer did in the so-called Game of the Century, what happened in some of the games he lost, and so on. In an afterword, Brady writes that he decided not to include any games so as to make the book more accessible. What I wonder is, how many readers are there like me, who enjoy chess, could understand a diagram and some discussion of what these amazing plays were, even if we couldn’t follow an entire game written on the page or have the patience to play one out on the board. I wouldn’t have gotten much out of transcripts of chess games, but a few diagrams and discussions of key moments, that would’ve made the book a lot more interesting to me.
P.S. After Kasparov beat Karpov in the final game of their tournament—the game where both players knew that Kasparov had to win, that a draw wouldn’t be enough—I clipped the game out of the newspaper and later played it out with my dad. That was a game. To my ignorant eyes, there was no single point where I could spot a mistake by Karpov. Kasparov just gradually and imperceptibly got to a winning position. Amazing.
Mark Palko writes:
Salmon is dismissive of the claim that there are fifty million over-the-air television viewers:
The 50 million number, by the way, should not be considered particularly reliable: it’s Aereo’s guess as to the number of people who ever watch free-to-air TV, even if they mainly watch cable or satellite. (Maybe they have a hut somewhere with an old rabbit-ear TV in it.)
And he strongly suggests the number is not only smaller but shrinking. By comparison, here’s a story from the broadcasting news site TV News Check from June of last year (if anyone has more recent numbers please let me know):
According to new research by GfK Media, the number of Americans now relying solely on over-the-air (OTA) television reception increased to almost 54 million, up from 46 million just a year ago. The recently completed survey also found that the demographics of broadcast-only households skew towards younger adults, minorities and lower-income families.
As Palko says, Salmon is usually a pretty careful reporter. And this one should be right up his alley. Here’s Palko again:
We’ve talked about how well over-the-air television compares to cable (for some people), how new and apparently successful businesses are springing up around OTA, and how the number of viewers getting their television through antennas appears to have been growing substantially since the introduction of digital. What we haven’t covered so far is the potential social impact of killing broadcast television.
It is almost axiomatic that, if you have a resource that is used in one way by people at the top of the economic ladder and in another way by people on the bottom and you “let the market decide” what to do with the resource, it will go with the people who have the money. . . .
This becomes particularly troubling when we’re talking about a publicly held resource. . . . What groups rely heavily on broadcast television? What groups would have the most difficulty finding alternatives?
People in the bottom one or two deciles are going to be in trouble. Even the lowest tier of cable would represent a significant monthly expense. People with limited residential security will be even worse off. People with limited income security will face a difficult choice: sign up for exorbitant no-contract plans or commit to a financial obligation they may not be able to fulfill. People with poor credit histories will have to come up with large deposits every time they move. . . .
OTA [over-the-air television] is a promising technology supporting an innovative and growing industry, serving important economic and social roles.
The technology is doing fine in the marketplace. It’s lobbyists who are likely to kill it.
I wonder what Salmon’s take is on this. Is Palko missing something, or does he just happen to be sharing a perspective that is different from that of NYC-based financial journalists?
P.S. Let me emphasize that this post is not some sort of trolling of Felix Salmon. I’m a big fan of his quantitatively sophisticated reporting, which is why it’s interesting if he’s getting something wrong.
P.P.S. There’s some dispute about that 54 million number. Salmon points to this news article by Michael Grotticelli:
Free, over-the-air television viewing of broadcast TV signals are now watched by only 9 percent of the U.S. population — down from 16 percent in 2003, according to Nielsen, the major TV and radio rating service. . . .
The Nielsen numbers are certain to cause a dispute with the NAB, which has insisted the amount of over-the-air viewing is increasing in an era of cord-cutting. Last summer, the NAB produced a survey by Knowledge Networks citing about 18 percent as “broadcast exclusive” households. That total was 54 million Americans — up from 46 million in 2011.
So, one claim is that 9% watch any over-the-air TV, the other is that 18% only watch over-the-air TV. That’s a big gap.
Social science research has been getting pretty bad press recently, what with the Excel buccaneers who didn’t know how to handle data with different numbers of observations per country, and the psychologist who published dozens of papers based on fabricated data, and the Evilicious guy who wouldn’t let people review his data tapes, etc etc. And that’s not even considering Dr. Anil Potti.
On the other hand, the revelation of all these problems can be taken as evidence that things are getting better. Psychology researcher Gary Marcus writes:
There is something positive that has come out of the crisis of replicability—something vitally important for all experimental sciences. For years, it was extremely difficult to publish a direct replication, or a failure to replicate an experiment, in a good journal. . . . Now, happily, the scientific culture has changed. . . . The Reproducibility Project, from the Center for Open Science is now underway . . .
And sociologist Fabio Rojas writes:
People may sneer at the social sciences, but they hold up as well. Recently, a well known study in economics was found to be in error. People may laugh because it was an Excel error, but there’s a deeper point. There was data, it could be obtained, and it could be replicated. Fixing errors and looking for mistakes is the hallmark of science. . . .
I agree with Marcus and Rojas that attention to problems of replication is a good thing. It’s bad that people are running incompetent analysis or faking data all over the place, but it’s good that they’re getting caught. And, to the extent that scientific practices are improving to help detect error and fraud, and to reduce the incentives for publishing erroneous and fradulent results in the first place, that’s good too.
But I worry about a sense of complacency. I think we should be careful not to overstate the importance of our first steps. We may be going in the right direction but we have a lot further to go. Here are some examples:
1. Marcus writes of the new culture of publishing replications. I assume he’d support the ready publications of corrections, too. But we’re not there yet, as this story indicates:
Recently I sent a letter to the editor to a major social science journal pointing out a problem in an article they’d published, they refused to publish my letter, not because of any argument that I was incorrect, but because they judged my letter to not be in the top 10% of submissions to the journal. I’m sure my letter was indeed not in the top 10% of submissions, but the journal’s attitude presents a serious problem, if the bar to publication of a correction is so high. That’s a disincentive for the journal to publish corrections, a disincentive for outsiders such as myself to write corrections, and a disincentive for researchers to be careful in the first place. Just to be clear: I’m not complaining how I was treated here; rather, I’m griping about the system in which a known error can stand uncorrected in a top journal, just because nobody managed to send in a correction that’s in the top 10% of journal submissions.
2. Rojas writes of the notorious Reinhardt and Rogoff study that, “There was data, it could be obtained, and it could be replicated.” Not so fast:
It was over two years before those economists shared the data that allowed people to find the problems in their study. If the system really worked, people wouldn’t have had to struggle for years to try to replicate an unreplicable analysis.
And, remember, the problem with that paper was not just a silly computer error. Reinhardt and Rogoff also made serious mistakes handling their time-series cross-sectional data.
3. Marcus writes in a confident tone about progress in methodology: “just last week, Uri Simonsohn [and Leif Nelson and Joseph Simmons] released a paper on coping with the famous file-drawer problem, in which failed studies have historically been underreported.” I think Uri Simonsohn is great, but I agree with the recent paper by Christopher Ferguson and Moritz Heene that the so-called file-drawer problem is not a little technical issue that can be easily cleaned up; rather, it’s fundamental to our current practice of statistically-based science.
And there’s pushback. Biostatisticians Leah Jager and Jeffrey Leek wrote a paper, which I strongly disagree with, called “Empirical estimates suggest most published medical research is true.” I won’t go into the details here—my take on their work is that they’re applying a method that can make sense in the context of a single large study but which won’t generally work with meta-analysis—my point is that there remains a constituency for arguments that science is basically OK already.
I respect the view of Marcus, Rojas, Jager, Leek, and others that the current environment of criticism has in some ways gone too far. All those people do serious, respected research, and those of us who do serious research know how difficult it can be to publish in good journals, how hard we work—out of necessity—to consider all possible alternative explanations for any results we find, how carefully we document the steps of our data collection and analysis, and so forth. But many problems still remain.
Thomas Basbøll analogizes the difficulties of publishing scientific criticism to problems with the subprime mortgage market before the crash. He quotes Michael Lewis:
To sell a stock or bond short you need to borrow it, and [the bonds they were interested in] were tiny and impossible to find. You could buy them or not buy them but you couldn’t bet explicitly against them; the market for subprime mortages simply had no place for people in it who took a dim view of them. You might know with certainty that the entire mortgage bond market was doomed, but you could do nothing about it.
And now here’s Basbøll:
I had a shock of recognition when I read that. I’ve been trying to “bet against” a number of stories that have been told in the organization studies literature for years now, and the thing I’m learning is that there’s no place in the literature for people who take a dim view of them. There isn’t really a genre (in the area of management studies) of papers that only points out errors in other people’s work. You have to make a “contribution” too. In a sense, you can buy the stories people are telling you or not buy them but you can’t criticize them.
This got me thinking about the difference between faith and knowledge. Knowledge, it seems to me, is a belief held in a critical environment. Faith, we might say, is a belief held in an “evangelical” environment. The mortgage bond market was an evangelical environment in which to hold beliefs about housing prices, default rates, and credit ratings on CDOs. There was no simple way to critique the “good news” . . .
Eventually, as Lewis reports, people were able to bet against the subprime mortgage market, but it wasn’t easy. And the fact that some investors, with great difficulty, were able to do it, doesn’t mean the financial system is A-OK.
Basbøll’s analogy may be going too far, but I agree with his general point that the existence of a few cases of exposure should not make us complacent. Marcus’s suggestions on cleaning up science are good ones, and we have a ways to go before they are generally implemented.
Continue reading ‘Against optimism about social science’ »
David Hogg pointed me to this post by Gary Marcus, reviewing this skeptics’ all-star issue of Perspectives on Psychological Science that features replication culture heroes Jelte Wicherts, Hal Pashler, Arina Bones, E. J. Wagenmakers, Gregory Francis, Hal Pashler, John Ioannidis, and Uri Simonsohn. I agree with pretty much everything Marcus has to say. In addition to Marcus’s suggestions, which might be called cultural or psychological, I also have various statistical ideas that might help move the field forward. Most notably I think we need to go beyond uniform priors and null-hypothesis testing to a more realistic set of models for effects and variation. I’ll discuss more at some other time, but in the meantime I thought I’d share these links.
P.S. Marcus updates with a glass-is-half-full take.
This was an good idea: take a bunch of old (and some recent) news articles on developments in mathematics and related ares from the past hundred years. Fun for the math content and historical/nostalgia value. Relive the four-color theorem, Fermat, fractals, and early computing.
I have too much of a technical bent to be the ideal reader for this sort of book, but it seems like an excellent gift for a non-technical reader who nonetheless enjoys math. (I assume that such people are out there, just as there are people like me who can’t read music but still enjoy reading about the subject.)
The book is organized by topic. My own preference would have been chronological and with more old stuff. I particularly enjoyed the material from many decades ago, such as the news report on one of the early computers. This must have been a fun book to compile.