Skip to content

Against optimism about social science

tip_of_the_iceberg

Social science research has been getting pretty bad press recently, what with the Excel buccaneers who didn’t know how to handle data with different numbers of observations per country, and the psychologist who published dozens of papers based on fabricated data, and the Evilicious guy who wouldn’t let people review his data tapes, etc etc. And that’s not even considering Dr. Anil Potti.

On the other hand, the revelation of all these problems can be taken as evidence that things are getting better. Psychology researcher Gary Marcus writes:

There is something positive that has come out of the crisis of replicability—something vitally important for all experimental sciences. For years, it was extremely difficult to publish a direct replication, or a failure to replicate an experiment, in a good journal. . . . Now, happily, the scientific culture has changed. . . . The Reproducibility Project, from the Center for Open Science is now underway . . .

And sociologist Fabio Rojas writes:

People may sneer at the social sciences, but they hold up as well. Recently, a well known study in economics was found to be in error. People may laugh because it was an Excel error, but there’s a deeper point. There was data, it could be obtained, and it could be replicated. Fixing errors and looking for mistakes is the hallmark of science. . . .

I agree with Marcus and Rojas that attention to problems of replication is a good thing. It’s bad that people are running incompetent analysis or faking data all over the place, but it’s good that they’re getting caught. And, to the extent that scientific practices are improving to help detect error and fraud, and to reduce the incentives for publishing erroneous and fradulent results in the first place, that’s good too.

But I worry about a sense of complacency. I think we should be careful not to overstate the importance of our first steps. We may be going in the right direction but we have a lot further to go. Here are some examples:

1. Marcus writes of the new culture of publishing replications. I assume he’d support the ready publications of corrections, too. But we’re not there yet, as this story indicates:

Recently I sent a letter to the editor to a major social science journal pointing out a problem in an article they’d published, they refused to publish my letter, not because of any argument that I was incorrect, but because they judged my letter to not be in the top 10% of submissions to the journal. I’m sure my letter was indeed not in the top 10% of submissions, but the journal’s attitude presents a serious problem, if the bar to publication of a correction is so high. That’s a disincentive for the journal to publish corrections, a disincentive for outsiders such as myself to write corrections, and a disincentive for researchers to be careful in the first place. Just to be clear: I’m not complaining how I was treated here; rather, I’m griping about the system in which a known error can stand uncorrected in a top journal, just because nobody managed to send in a correction that’s in the top 10% of journal submissions.

2. Rojas writes of the notorious Reinhardt and Rogoff study that, “There was data, it could be obtained, and it could be replicated.” Not so fast:

It was over two years before those economists shared the data that allowed people to find the problems in their study. If the system really worked, people wouldn’t have had to struggle for years to try to replicate an unreplicable analysis.

And, remember, the problem with that paper was not just a silly computer error. Reinhardt and Rogoff also made serious mistakes handling their time-series cross-sectional data.

3. Marcus writes in a confident tone about progress in methodology: “just last week, Uri Simonsohn [and Leif Nelson and Joseph Simmons] released a paper on coping with the famous file-drawer problem, in which failed studies have historically been underreported.” I think Uri Simonsohn is great, but I agree with the recent paper by Christopher Ferguson and Moritz Heene that the so-called file-drawer problem is not a little technical issue that can be easily cleaned up; rather, it’s fundamental to our current practice of statistically-based science.

And there’s pushback. Biostatisticians Leah Jager and Jeffrey Leek wrote a paper, which I strongly disagree with, called “Empirical estimates suggest most published medical research is true.” I won’t go into the details here—my take on their work is that they’re applying a method that can make sense in the context of a single large study but which won’t generally work with meta-analysis—my point is that there remains a constituency for arguments that science is basically OK already.

I respect the view of Marcus, Rojas, Jager, Leek, and others that the current environment of criticism has in some ways gone too far. All those people do serious, respected research, and those of us who do serious research know how difficult it can be to publish in good journals, how hard we work—out of necessity—to consider all possible alternative explanations for any results we find, how carefully we document the steps of our data collection and analysis, and so forth. But many problems still remain.

Thomas Basbøll analogizes the difficulties of publishing scientific criticism to problems with the subprime mortgage market before the crash. He quotes Michael Lewis:

To sell a stock or bond short you need to borrow it, and [the bonds they were interested in] were tiny and impossible to find. You could buy them or not buy them but you couldn’t bet explicitly against them; the market for subprime mortages simply had no place for people in it who took a dim view of them. You might know with certainty that the entire mortgage bond market was doomed, but you could do nothing about it.

And now here’s Basbøll:

I had a shock of recognition when I read that. I’ve been trying to “bet against” a number of stories that have been told in the organization studies literature for years now, and the thing I’m learning is that there’s no place in the literature for people who take a dim view of them. There isn’t really a genre (in the area of management studies) of papers that only points out errors in other people’s work. You have to make a “contribution” too. In a sense, you can buy the stories people are telling you or not buy them but you can’t criticize them.

This got me thinking about the difference between faith and knowledge. Knowledge, it seems to me, is a belief held in a critical environment. Faith, we might say, is a belief held in an “evangelical” environment. The mortgage bond market was an evangelical environment in which to hold beliefs about housing prices, default rates, and credit ratings on CDOs. There was no simple way to critique the “good news” . . .

Eventually, as Lewis reports, people were able to bet against the subprime mortgage market, but it wasn’t easy. And the fact that some investors, with great difficulty, were able to do it, doesn’t mean the financial system is A-OK.

Basbøll’s analogy may be going too far, but I agree with his general point that the existence of a few cases of exposure should not make us complacent. Marcus’s suggestions on cleaning up science are good ones, and we have a ways to go before they are generally implemented.
Continue reading ‘Against optimism about social science’ »

Cleaning up science

David Hogg pointed me to this post by Gary Marcus, reviewing this skeptics’ all-star issue of Perspectives on Psychological Science that features replication culture heroes Jelte Wicherts, Hal Pashler, Arina Bones, E. J. Wagenmakers, Gregory Francis, Hal Pashler, John Ioannidis, and Uri Simonsohn. I agree with pretty much everything Marcus has to say. In addition to Marcus’s suggestions, which might be called cultural or psychological, I also have various statistical ideas that might help move the field forward. Most notably I think we need to go beyond uniform priors and null-hypothesis testing to a more realistic set of models for effects and variation. I’ll discuss more at some other time, but in the meantime I thought I’d share these links.

P.S. Marcus updates with a glass-is-half-full take.

The New York Times Book of Mathematics

This was an good idea: take a bunch of old (and some recent) news articles on developments in mathematics and related ares from the past hundred years. Fun for the math content and historical/nostalgia value. Relive the four-color theorem, Fermat, fractals, and early computing.

I have too much of a technical bent to be the ideal reader for this sort of book, but it seems like an excellent gift for a non-technical reader who nonetheless enjoys math. (I assume that such people are out there, just as there are people like me who can’t read music but still enjoy reading about the subject.)

The book is organized by topic. My own preference would have been chronological and with more old stuff. I particularly enjoyed the material from many decades ago, such as the news report on one of the early computers. This must have been a fun book to compile.

One more thought on Hoover historian Niall Ferguson’s thing about Keynes being gay and marrying a ballerina and talking about poetry

eddie

We had some interesting comments on our recent reflections on Niall Ferguson’s ill-chosen remarks in which he attributed Keynes’s economic views (I don’t actually know exactly what Keyesianism is, but I think a key part is for the government to run surpluses during economic booms and deficits during recessions) to the Keynes being gay and marrying a ballerina and talking about poetry. The general idea, I think, is that people without kids don’t care so much about the future, and this motivated Keynes’s party-all-the-time attitude, which might have worked just fine for Eddie Murphy’s girlfriend in the 1980s and in San Francisco bathhouses of the 1970s but, according to Ferguson, is not the ticket for preserving today’s American empire.

Some of the more robust defenders of Ferguson may have been disappointed by his followup remarks: “I should not have suggested . . . that Keynes was indifferent to the long run because he had no children, nor that he had no children because he was gay. This was doubly stupid. . . . My disagreements with Keynes’s economic philosophy have never had anything to do with his sexual orientation. It is simply false to suggest, as I did, that his approach to economic policy was inspired by any aspect of his personal life.” It’s tough to try to defend a statement that was disowned by the person saying it.

But the question then arises: What’s so horrible about what Ferguson said? After all, it’s not unreasonable to think that someone’s personal circumstances will affect their political attitudes and their views on economic policy. And certainly no one doubts that Keynes’s background as an upper-class British backgrounds was relevant for understanding his views.

So what was up?
Continue reading ‘One more thought on Hoover historian Niall Ferguson’s thing about Keynes being gay and marrying a ballerina and talking about poetry’ »

The Folk Theorem of Statistical Computing

From an email I received the other day:

Things are going much better now — it’s interesting, it feels like with both of my models, parameters are slow to converge or get “stuck” and have trouble mixing when the model is somehow misspecified.

See here for a statement of the folk theorem.

Jesus historian Niall Ferguson and the improving standards of public discourse

History professor (or, as the news reports call him, “Harvard historian”) Niall Ferguson got in trouble when speaking at a conference of financial advisors. Tom Kostigen reports:
Continue reading ‘Jesus historian Niall Ferguson and the improving standards of public discourse’ »

NYC Data Skeptics Meetup

Rachel Schutt writes:

The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. How accurate or misleading is this message? How is it helping or damaging people, and which people? What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view?

This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. We will take deep dives into models in the fields of Data Science, statistics, financial engineering, and economics.

This is an independent forum and open to anyone sharing an interest in the larger use of data. Technical aspects will be discussed, but attendees do not need to have a technical background.

Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

Pointing to this news article by Megan McArdle discussing a recent study of Medicaid recipients, Jonathan Falk writes:

Forget the interpretation for a moment, and the political spin, but haven’t we reached an interesting point when a journalist says things like:

When you do an RCT with more than 12,000 people in it, and your defense of your hypothesis is that maybe the study just didn’t have enough power, what you’re actually saying is “the beneficial effects are probably pretty small”.

and

A good Bayesian—and aren’t most of us are supposed to be good Bayesians these days?—should be updating in light of this new information. Given this result, what is the likelihood that Obamacare will have a positive impact on the average health of Americans? Every one of us, for or against, should be revising that probability downwards. I’m not saying that you have to revise it to zero; I certainly haven’t. But however high it was yesterday, it should be somewhat lower today.

This is indeed an excellent news article. Also this sensible understanding of statistical significance and effect sizes:

But that doesn’t mean Medicaid has no effect on health. It means that Medicaid had no statistically significant effect on three major health markers during a two-year study. Those are related, but not the same. And in fact, all three markers moved in the right direction. They just weren’t big enough to rule out the possibility that this was just random noise in the underlying data. I’d say this suggests that it’s more likely than not that there is some effect–but also, more likely than not that this effect is small.

Continue reading ‘Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism’ »

Culture clash

Screen Shot 2013-05-02 at 10.13.04 PM

I had no idea this sort of thing even existed:

Screen Shot 2013-05-02 at 10.14.43 PM

I’m reminded of our discussion of Charles Murray’s recent book on social divisions among Americans. Murray talked about differences between upper and lower class, but I thought he was really talking more about differences between liberals and conservatives among the elite. (More discussion here.)

In this particular case, Murray’s story about irresponsible elites seems to fit pretty well. At the elite level, you have well-connected D.C. gun lobbyists opposing any restrictions on personal weapons. As Murray might put it, the elites (Phil Spector aside) may be able to handle their guns, but some lower-class Americans cannot—they do things like give real rifles to 5-year-olds (!). As Murray writes, it’s a combination of cultural ignorance and a permissive ideology: I assume the senators who voted against the recent gun control bill wouldn’t give live weapons to their kids (or live in neighborhoods in which kids have access to guns at home), but they don’t feel right about restricting the rights of others to do so.

P.S. After reading some comments, I thought it might help to clarify two points.
Continue reading ‘Culture clash’ »

7 ways to separate errors from statistics

sharing

Betsey Stevenson and Justin Wolfers have been inspired by the recent Reinhardt and Rogoff debacle to list “six ways to separate lies from statistics” in economics research:

1. “Focus on how robust a finding is, meaning that different ways of looking at the evidence point to the same conclusion.”

2. Don’t confuse statistical with practical significance.

3. “Be wary of scholars using high-powered statistical techniques as a bludgeon to silence critics who are not specialists.”

4. “Don’t fall into the trap of thinking about an empirical finding as ‘right’ or ‘wrong.’ At best, data provide an imperfect guide.”

5. “Don’t mistake correlation for causation.”

6. “Always ask ‘so what?’”

I like all these points, especially #4, which I think doesn’t get said enough. As I wrote a few months ago, high-profile social science research aims for proof, not for understanding—and that’s a problem.

My addition to the list

If you compare my title above to that of Stevenson and Wolfers, you’ll find two differences. First, I changed “lies” to “errors.” I have no idea who’s lying, and I’m much more comfortable talking about errors. Second, I think they missed an even better, more general way to find mistakes:

7. Make your data and analysis public.

This is the best approach, because now you can have lots of strangers checking your work for free! This advice is also particularly appropriate for Reinhardt and Rogoff because, according to various reports (see here and here), it was years before they made their data available to outsiders. Nearly three years ago (!), Dean Baker wrote a column entitled, “It Would Be Helpful if Rogoff and Reinhart Made Their Data Available.”

Perhaps “the risk of forced disclosure” (as Keith O’Rourke puts it) will motivate researchers to be more careful in the future.

Your additions?

I told Wolfers I was going to link to his list and add my own #7. He replied that we’re probably missing #8, 9, and 10. In the comments, feel free to add your favorite ways to separate errors from statistics. Phil already gave some here.

A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?

Kaiser Fung discusses the following graph that is captioned, “A study of 54 nations–ranked below–found that those with more progressive tax rates had happier citizens, on average.”

6a00d8341e992c53ef017d43093cb8970c

As Kaiser writes, “from a purely graphical perspective, the chart is well executed . . . they have 54 points, and the chart still doesn’t look too crammed . . .” But he also points out that the graph’s implicit claims (that tax rates can explain happiness or cause more happiness) are not supported.

Kaiser and I are not being picky-picky-picky here. Taken literally, the graph title says nothing about causation, but I think the phrasing implies it. Also, from a purely descriptive perspective, the graph is somewhat at war with its caption. The caption announces a relationship, but in the graph, the x and y variables have only a very weak correlation. The caption says that happiness and progressive tax rates go together, but the graph uses the U.S. as a baseline, and when you move from the U.S. point on the graph to the right-hand side (more progressive taxes), you see a lot more points below the line than above the line. Thus the visual impression of the graph is that more progressive taxes will lead to lower happiness—the opposite of the message from the caption.

What can be done here?

I don’t exactly think the graph is “bad data,” and, although the graph says little directly about causation, the data have some relevance to our understanding of policy debates over taxes. If nothing else, we learn that tax progressivity and average happiness some variation among countries. I think a start would be to reframe and put happiness on the x-axis and the tax system on the y-axis, which would allow us to see that, at any happiness level, there is a range of tax systems. with none of the very happiest countries having flat taxes.

Better still might be to make a line plot with three columns: First, a list of country names, in decreasing order from richest to poorest (using, for example, per-capita GDP (yes, I know, such data aren’t perfect!)), then a column showing tax progressivity (if that’s the measure they want to use), then a column showing average happiness.

The advantage of this pair of dotplots is that you get to see the spread in each of these variables with respect to a natural measure (how rich the country is), and there’s no implicit causal story getting in the way.

“Tragedy of the science-communication commons”

I’ve earlier written that science is science communication—that is, the act of communicating scientific ideas and findings to ourselves and others is itself a central part of science. My point was to push against a conventional separation between the act of science and the act of communication, the idea that science is done by scientists and communication is done by communicators. It’s a rare bit of science that does not include communication as part of it. As a scientist and science communicator myself, I’m particularly sensitive to devaluing of communication. (For example, Bayesian Data Analysis is full of original research that was done in order to communicate; or, to put it another way, we often think we understand a scientific idea, but once we try to communicate it, we recognize gaps in our understanding that motivate further research.)

I once saw the following on one of those inspirational-sayings-for-every-day desk calendars: “To have ideas is to gather flowers. To think is to weave them into garlands.” Similarly, writing—more generally, communication to oneself or others—forces logic and structure, which are central to science.

Dan Kahan saw what I wrote and responded by flipping it around: He pointed out that there is a science of science communication. As scientists, we should move beyond the naive view of communication as the direct imparting of facts and ideas. We should think more systematically about how communications are produced and how they are understood by their immediate and secondary recipients.

The science of science communication is still in its early stages, and I’m glad that people such as Kahan are working on it. Here’s something he wrote recently explicating his theory of cultural cognition:

The motivation behind this research has been to understand the science communication problem. The “science communication problem” (as I use this phrase) refers to the failure of valid, compelling, widely available science to quiet public controversy over risk and other policy relevant facts to which it directly speaks. The climate change debate is a conspicuous example, but there are many others, including (historically) the conflict over nuclear power safety, the continuing debate over the risks of HPV vaccine, and the never-ending dispute over the efficacy of gun control. . . . The research I will describe reflects the premise that making sense of these peculiar packages of types of people and sets of factual beliefs is the key to understanding—and solving—the science communication problem. The cultural cognition thesis posits that people’s group commitments are integral to the mental processes through which they apprehend risk. . . .

I think of Kahan as part of a loose network of constructive skeptics, along with various people including Thomas Basbøll, John Ioannidis, the guys at Retraction Watch, pissed-off scholars such as Stan Liebowitz, bloggers such as Felix Salmon, and a whole bunch of psychology researchers such as Wicherts, Wagenmakers, Simonsohn, Nosek, etc. This is not to represent a complete list but rather is intended to give a sense of the different aspect of this movement-without-a-name. 10 or 20 or 30 years ago, I don’t think such a movement existed. There were concerns about individual studies or research programs but not such a sense of a statistics-centered crisis in science as a whole.

Giving credit where due

Gregg Easterbrook may not always be on the ball, but I 100% endorse the last section of his recent column (scroll down to “Absurd Specificity Watch”).

Earlier in the column, Easterbrook has a plug for Tim Tebow. I’d forgotten about Tim Tebow.

The Great Race

This post is by Phil.

Last summer my wife and I took a 3.5-month vacation that included a wide range of activities. When I got back, people would ask “what were the highlights or your trip?”, and I was somewhat at a loss: we had done so many things that were so different, many of which seemed really great…how could I pick? Someone said, wisely, that in six months or a year I’d be able to answer the question because some memories would be more vivid than others. They were right, and I was recently thinking back on our vacation and putting together a list of highlights — enjoyable in itself, but also worth doing to help plan future vacations.

One of the things we did was go to four evenings of track and field events at the London Olympics. After we got back, people would ask what we had seen at the Olympics. I would say “We saw Usain Bolt run the 200m, we saw the women’s 4x100m relay and the men’s 4×400, we saw the last events of the decathlon…lots of great stuff. But my favorite was the men’s 800m.”

Trying to figure out why that was one of my favorite events to watch, I looked up some facts and statistics about the race. Perhaps unexpectedly, I think that some of the things that made it great, as both an athletic contest and a spectacle, are reflected in the stats.

Continue reading ‘The Great Race’ »

The blogroll

Chain Links

I encourage you to check out our linked blogs. Here’s what they’re all about:

Cognitive and Behavioral Science

BPS Research Digest: I haven’t been following this one recently, but it has lots of good links, I should probably check it more often. There are a couple things that bother me, though. The blog is sponsored by the British Psychological Society, so this sounds pretty serious. But then they run things like advertising promotions sponsored by a textbook company and highlight iffy experimental claims. For example, in 2010 they ran a wholly uncritical post on the notorious Daryl Bem study that purported to find ESP. After being called on it in the comments, the blogger (Christian Jarrett) responded with, “The stats appear sound. . . . it’s a great study. Rigorously conducted” and even defended “the discussion of quantum physics in the paper.” To be fair, though, and as he points out in comments, Jarrett wrote of Bem’s study: “this isn’t proof of psi, far from it. Needs to be replicated. I like how Bem has used standard psychological tasks as a way to explore psi. Makes it easier for other labs to try to replicate.” Jarrett writes that he tries to “strike a balance between promotion and skepticism of new findings.” Fair enough.

Decision Science News: A mix of conference announcements and reports of new research. Here’s a typical example. I love this stuff; others might find it a bit technical. Also, this blog runs ads. I wonder how much the advertisers pay? I can’t imagine anyone would pay enough to a niche blogger to make the ads worth it. I mean, sure, if an advertiser offered me enough money for me to hire a postdoc, I’d do it, but I can only imagine we’re talking really small amounts of money. A topic of discussion for Decision Science News, perhaps?

Language Log: Not much needs to be said here. This one’s a classic blog with lots of statistical content, remains strong after all these years.

Seth Roberts: I disagree with him on climate change denial, Holocaust denial, etc. Still, he’s a pioneer of self-experimentation. I hope that the next generation of psychology or medical research involves an integration of informal experimentation with statistical controls.

The Hardest Science: Mostly revolves around reproducible research. It’s where I heard the story of the lamest, grudgingest, non-retraction retraction ever.

Cultural

Light Reading: She’s like me, she likes to write and has a lot of energy. I’m still wondering what she will think of Debutante Hill (I’ll lend her my copy).

Lists and Letters of Note: Great stuff but not much new material lately; he says he’s busy working on a book.

Love the Liberry: Amazingly enough, they keep coming up with good material.

Paperpools: Not much material lately. As it should be. We want Helen DeWitt to be writing novels, not blogging!

Research as a Second Language: Anti-charismatic self-help advice. The alternative to those omnipresent shouting, obnoxious internet gurus.

Streetsblog: Good stuff. Ideally this would all be in your daily newspaper. I don’t read it too often; if I did, I’d be too angry to think about anything else all day.

Sister Blogs

The Monkey Cage: Sometimes I simul-post, other times I’ll rant there and then link from here. (for example)

The Statistics Forum: I recently formulated the plan to fill it up with 365 stories. So far, though, I’ve only received a few. So maybe just a story a week? I’m not sure what to do with this blog. An official American Statistical Association blog seems like a good thing but I don’t really know what to do with it.

Social and Political Science

Chris Blattman: International development, politics, economics, and policy.

Fivethirtyeight: Nate does a good job. I like how he can focus on whatever question he’s answering without getting overwhelmed. Here’s a good recent example.

Lane Kenworthy is a completely serious and reasonable person, just as his name would suggest.

Marginal Revolution: You’ve heard of these guys.

Monthly Labor Review Precis: Direct links to research on things that matter. Good stuff.

Overcoming Bias: He recently wrote, “most people we know talk as if they hate, revile, and despise ads. They say ads are an evil destructive manipulative force that exists only because big bad firms run the world, and use ads to control us all.” I was surprised to hear that most people Robin Hanson knows talk that way, and this gives me a new perspective on why he writes the way he does. It’s gotta be frustrating, hanging around people who talk about big bad firms and evil destructive manipulative forces.

Rajiv Sethi: He only blogs a couple times a month, but he always has something interesting to say. (The opposite of this blog, I suppose.)

The Baby Name Wizard: The one and only, by the people who, among other things, debunked the myth that there’s something special about the word “orange.” But you can just skip directly to the Name Voyager.

U.S. Census Blog: Not the funnest thing out there to read, but it’s good that the people at the Census are doing this for us. When you need good data, the Census is there for you.

Statistics and Machine Learning

Bob Carpenter: He wrote Stan.

Chance News: The original statistics blog.

Christian Robert: People who used to do theoretical statistics, now do computational statistics. This is a good thing.

Cosma Shalizi: He has an odd retro style and enough combination of common sense and knowledge of philosophy that I asked him to collaborate on my paper that became this. His set of interests and frustrations seems to overlap a lot with mine, except that he doesn’t really ride a bike and I’m sure there are some big parts of his life that don’t match to anything in mine.

Deborah Mayo: I learned about her through Shalizi. Mayo believes in learning through model checking, just like Jaynes (and me). Her blog features long comment threads and contributions from the likes of Stephen Senn.

John Cook: Like Tyler Cowen, a guy who does a lot of things but is best known for his blogging. He throws in some applied math and numerical analysis along with the statistics.

Kaiser Fung: Fun to read and utterly sensible. Among many other things, he offered a good probabilistic summary of the Lance Armstrong story, well before it finally broke.

Larry Wasserman: His perspective on statistics is different from mine (for example, he defines p(a|b) = p(a,b)/p(b), whereas I define p(a,b)=p(a|b)p(b)), but it’s good that he can get his views out there. Research proceeds in many different ways, and if everyone agreed with me (or with any single perspective), the field of statistics would make a lot less progress.

Messy Matters: This one reads a bit more like a draft of a pop-science book than like a blog. The trouble is, there are already so many pop-science books about economics and data. They’ll have to come up with their own unique twist.

Nuit Blanche: Compressive sensing: that’s cool stuff! I’m impressed by these CS guys who can effortlessly throw around terabytes of data.

Observational Epidemiology: These guys are thoughtful and I admire the effort they put into their blogging. If they’d started blogging in 2003, they would’ve been on everyone’s blogroll.

Stats Blogs: A convenient compendium, with links back to the originals.

The Numbers Guy: Carl Bialik is one of the original data journalists. He, Falix Salmon, and Nate Silver have very similar profiles (as Bill James might say).

Visualization

Chartsnthings: This is the ultimate graphics blog. The New York Times graphics team presents some great data visualizations along with the stories behind them. I love this sort of insider’s perspective.

Eager Eyes: Graphics research.

Information Aesthetics: Seriously pretty.

Junk Charts: The nitty gritty. What to read if you want to make your own graphs better.

Plain old everyday Bayesianism!

Sam Behseta writes:

There is a report by Martin Tingley and Peter Huybers in Nature on the unprecedented high temperatures at northern latitudes (Russia, Greenland, etc). What is more interesting is the authors are have used a straightforward hierarchical Bayes model, and for the first time (as far as I can remember) the results are reported with a probability attached to them (P>0.99), as opposed to the usual p-value<0.01 business. This might be a sign that editors of big time science journals are welcoming Bayesian approaches.

I agree. This is a good sign for statistical communication. Here are the key sentences from the abstract:

Here, using a hierarchical Bayesian analysis of instrumental, tree-ring, ice-core and lake-sediment records, we show that the magnitude and frequency of recent warm temperature extremes at high northern latitudes are unprecedented in the past 600 years. The summers of 2005, 2007, 2010 and 2011 were warmer than those of all prior years back to 1400 (probability P > 0.95), in terms of the spatial average. The summer of 2010 was the warmest in the previous 600 years in western Russia (P > 0.99) and probably the warmest in western Greenland and the Canadian Arctic as well (P > 0.90). These and other recent extremes greatly exceed those expected from a stationary climate, but can be understood as resulting from constant space–time variability about an increased mean temperature.

As with classical p-values, these probability statements depend on an assumed model, but I agree with Sam that the expression of direct probabilities is a huge step forward from traditional practice.

Time-Sharing Experiments for the Social Sciences

Continued fractions!!

Upon reading this note by John Cook on continued fractions, I wrote:

If you like continued fractions, I recommend you read the relevant parts of the classic Numerical Methods That Work. The details are probably obsolete but it’s fun reading (at least, if you think that sort of thing is fun to read).

I then looked up Acton in Wikipedia and was surprised to see he’s still alive. And he wrote a second book (published at the age of 77!). I wonder if it’s any good. It’s sobering to read Numerical Methods That Work: it’s so wonderful and so readable, yet in this modern era there’s really not much reason to read it. Perhaps William Goldman (hey, I checked: he’s still alive too!) or some equivalent could prepare a 50-page “good parts” version that could be still be useful as a basic textbook.

“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

Erin Jonaitis points us to this article by Christopher Ferguson and Moritz Heene, who write:

Publication bias remains a controversial issue in psychological science. . . . that the field often constructs arguments to block the publication and interpretation of null results and that null results may be further extinguished through questionable researcher practices. Given that science is dependent on the process of falsification, we argue that these problems reduce psychological science’s capability to have a proper mechanism for theory falsification, thus resulting in the promulgation of numerous “undead” theories that are ideologically popular but have little basis in fact.

They mention the infamous Daryl Bem article. It is pretty much only because Bem’s claims are (presumably) false that they got published in a major research journal. Had the claims been true—that is, had Bem run identical experiments, analyzed his data more carefully and objectively, and reported that the results were consistent with the null hypothesis—then the result would be entirely unpublishable. After all, you can’t publish an article in a top journal demonstrating that a study is consistent with there being no ESP. Everybody knows that ESP, to the extent it exists, has such small effects as to be essentially undetectable in any direct study. So here you have the extreme case of a field in which errors are the only thing that gets published.

It’s science as Slate magazine is reputed to be: if it’s true, it’s obvious so no need to publish. If it’s counterintuitive, go for it. (Just to be clear, I’m not saying the actual Slate magazine is like that; this is just its reputation.)

This is indeed disturbing and I applaud yet another publication on the topic. The authors go beyond previous research by Gregory Francis and Uri Simonsohn by focusing specifically on difficulties with meta-analyses that unsuccessfully try to overcome problems of publication bias.

There’s something called the fail-safe number (FSN) of Rosenthal (1979) and Rosenthal and Rubin (1978), “an early and still widely used attempt to estimate the number of unpublished studies, averaging null results, that are required to bring the meta-analytic mean Z value of effect sizes down to an insignificant level,” but,

The FSN treats the file drawer of unpublished studies as unbiased by assuming that their average Z value is zero. This wrong assumption appears mostly not to be recognized by researchers who use the FSN to demonstrate the stability of their results. . . . Without making this computational error, the FSN turns out to be a gross overestimate of the number of unpublished studies required to bring the mean Z value of published studies to an insignificant level. The FSN thus gives the meta-analytic researcher a false sense of security.

The false sense of security persists:

Although this fundamental flaw had been spotted early, the number of applications of the FSN has grown exponentially since its publication. Ironically, getting critiques of the FSN published was far from an easy task . . .

Problems with meta-analysis

Ferguson and Heene continue:

Meta-analyses should be more objective arbiters of review for a field than are narrative reviews, but we argue that this is not the case in practice. . . . The selection and interpretation of effect sizes from individual studies requires decisions that may be susceptible to researcher biases.

It is thus not surprising that we have seldom seen a meta-analysis resolve a controversial debate in a field. Typically, the antagonists simply decry the meta-analysis as fundamentally flawed or produce a competing meta-analysis of their own . . . meta-analyses may be used in such debates to essentially confound the process of replication and falsification.

Thus:

The average effect size may be largely meaningless and spurious due to the avoidance of null findings in the published literature. This aversion to the null is arguably one of the most pernicious and unscientific aspects of modern social science.

Let me interject here that, although I am in general agreement with Ferguson and Heene on these issues, I have a bit of “aversion to the null” myself. I think it’s important to separate the statistical from the scientific null hypothesis.

- The statistical null hypothesis is typically that a particular comparison is exactly zero in the population.

- The scientific null hypothesis is typically that a certain effect is nonexistent or, more generally, that the effect depends so much on situation as to be unreplicable in general.

I might well believe in the scientific null but not in the statistical null.

Virtually unkillable

Ferguson and Heene continue:

The aversion to the null and the persistence of publication bias and denial of the same, renders a situation in which psychological theories are virtually unkillable. Instead of rigid adherence to an objective process of replication and falsification, debates within psychology too easily degenerate into ideological snowball fights, the end result of which is to allow poor quality theories to survive indefinitely. Proponents of a theory may, in effect, reverse the burden of proof, insisting that their theory is true unless skeptics can prove it false (a fruitless invitation, as any falsifying data would certainly be rejected as flawed were it even able to pass through the null-aversive peer review process described above).

Indeed. We see this reversal of the burden of proof all the time. For example, after a data alignment error was uncovered in their research, Neil Anderson and Deniz Ones notoriously wrote: “When any call is made for the retraction of two peer-reviewed and published articles, the onus of proof is on the claimant and the duty of scientific care and caution is manifestly high. . . . Goldberg et al. do not and cannot provide irrefutable proof of the alleged clerical errors.. . . . We continue to stand by the findings and conclusions reported in our previous publications” Ugh! This bothered me so much when I saw it, it made me want to barf. At the time, I wrote that it’s unscientific behavior not to admit error. Unfortunately, for reasons discussed by Ferguson and Heene, much of the scientific enterprise seems to be set up to avoid admission of error. These are serious issues, and it’s interesting to me that we as a field haven’t been thinking much about them until recently.

Fascinating graphs from facebook data

Yair points us to this page full of wonderful graphs from the Stephen Wolfram blog. Here are a few:

wolfram1

wolfram2

wolfram3

And some words:

People talk less about video games as they get older, and more about politics and the weather. Men typically talk more about sports and technology than women—and, somewhat surprisingly to me, they also talk more about movies, television and music. Women talk more about pets+animals, family+friends, relationships—and, at least after they reach child-bearing years, health. . . . Some of this is rather depressingly stereotypical. And most of it isn’t terribly surprising to anyone who’s known a reasonable diversity of people of different ages. But what to me is remarkable is how we can see everything laid out in such quantitative detail in the pictures above—kind of a signature of people’s thinking as they go through life.

Of course, the pictures above are all based on aggregate data, carefully anonymized. But if we start looking at individuals, we’ll see all sorts of other interesting things. . . .

Good stuff, and I like the flexible, open attitude. And great graphs. That’s why I’m posting this, in order to spread the word, to inspire others to do this sort of statistical exploration. Follow the link for lots more.

By the way . . .

I wonder who did the analysis, who made the graphs, and who wrote the text. No authors are listed. It’s posted on the Stephen Wolfram Blog, but Wolfram is known for contracting out his research. It’s certainly possible that he did all the statistical analysis, computing, graphics, and writing himself, I just have no idea. It’s funny: in academia, allocation of credit and attribution of authorship is huge. In industry, not so much. As an academic, I’d like to give credit to whoever made these pretty graphs, but perhaps from Wolfram’s perspective, whoever made the graphs is just doing a job, just like whoever sweeps the floors in the lab or whoever cleans the erasers in the classroom. In any case, I give Wolfram credit, no joke. Even if he didn’t do any of the work on this, it takes skill to hire the right people to do the job.

It’s binless! A program for computing normalizing functions

binless

Zhiqiang Tan writes:

I have created an R package to implement the full likelihood method in Kong et al. (2003). The method can be seen as a binless extension of so-called Weighted Histogram Analysis Method (UWHAM) widely used in physics and chemistry. The method has also been introduced to the physics literature and called the Multivariate Bennet Acceptance Ratio (MBAR) method. But a key point of my implementation is to compute the free energy estimates by minimizing a convex function, instead of solving nonlinear equations by the self-consistency or the Newton-Raphson algorithm.

Samurai sword-wielding Mormon bishop pharmaceutical statistician stops mugger

042413_edge_sword_640

Brett Keller points us to this feel-good story of the day:

A Samurai sword-wielding Mormon bishop helped a neighbor woman escape a Tuesday morning attack by a man who had been stalking her.

Kent Hendrix woke up Tuesday to his teenage son pounding on his bedroom door and telling him somebody was being mugged in front of their house. The 47-year-old father of six rushed out the door and grabbed the weapon closest to him — a 29-inch high carbon steel Samurai sword. . . .

Hendrix, a pharmaceutical statistician, was one of several neighbors who came to the woman’s aid after she began yelling for help . . .

Too bad the whole “statistician” thing got buried in the middle of the article. Fair enough, though: I don’t know what it takes to become a Mormon bishop, but I assume it’s more effort than what it takes to learn statistics.

My talk midtown this Friday noon (and at Columbia Monday afternoon)

At the City University of New York Graduate Center, 365 Fifth Avenue (between 34th and 35th street), room 6002. The topic: causality and statistical learning. Announcement is here (scroll down). It says that if you would like to attend any event, please respond by emailing datamining@gc.cuny.edu

I’m also giving a shorter talk on the same topic in the Sustainable Development Seminar Series 4pm Monday 29 Apr in room 407 International Affairs Bldg (at 118th St. and Amsterdam Ave.).

It’s just a coincidence that I’m giving the same talk twice. I was asked at different times to speak for these groups. When someone asks me to speak, I let them pick from recent talks on the list.

The Tweets-Votes Curve

Fabio Rojas points me to this excellently-titled working paper by Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and himself:

Is social media a valid indicator of political behavior? We answer this ques- tion using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composi- tion. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit.

One charming thing about this paper—and I know this is going to sound patronizing but I don’t mean it to be—is that the authors (or, at least, whatever subset of the authors who did the statistical work) are amateurs. They analyze the outcome in terms of total votes rather than vote proportion, even while coding the predictor as a proportion. They present regression coefficients to 7 significant figures. They report that they have data from two different election cycles but present only one in the paper (but they do have the other in their blog post).

But that’s all ok. They pulled out some interesting data. And, as I often say, the most important aspect of a statistical analysis is not what you do with the data, it’s what data you use.

Tweets and votes

As to the result itself, I’m not quite sure what to do with it. Here’s the key graph:

tweet1

More tweets, more votes, indeed.
Of course most congressional elections are predictable. But the elections that are between 40-60 and 60-40, maybe not so much. So let’s look at the data there . . . Not such a strong pattern (and for the 2012 data in the 40-60% range it looks even worse; any correlation is swamped by the noise). That’s fine, and it’s not unexpected, it’s not a criticism of the paper but it indicates that the real gain in this analysis is not for predicting votes.

I’m not so convinced that tweets will be so useful in predicting votes—most congressional elections are predictable, but perhaps the prediction tool could be more relevant in low-information or multicandidate elections where prediction is not so easy.

Instead, it might make sense to flip it around and predict twitter mentions given candidate popularity. That is, rotate the graph 90 degrees, and see how much variation there is in tweet shares for elections of different degrees of closeness. Also, while you’re at it, re-express vote share as vote proportion. And scale the size of each dot to the total number of tweets for the two candidates in the election.

Move away from trying to predict votes and move toward trying to understand tweets. DiGrazia et al. write, “the models show that social media matters . . .” No, not at all. They find a correlation between candidate popularity and social media mentions. No-name and fringe candidates get fewer mentions (on average) than competitive and dominant candidates. That’s fine, you can go with that.

Again, I fear the above sounds patronizing, but I don’t mean it to be. You gotta start somewhere, and you’re nowhwere without the data. As someone who was (originally) an outsider to the field of political science, I do think that researchers coming from other fields can offer valuable perspectives.

Sharing the data

What I want to know is, is this dataset publicly available? What would really make this go viral is if DiGrazia et al. post the data online. Then everyone will take a hack at it, and each of those people will cite them.

There’s been a lot of talk about reproducible research lately. In this case, they have a perfect incentive to make they data public: it will help them out, it will help out the research project, and it will be a great inspiration for others to follow in their footsteps. Releasing data as a publicity intensifier: that’s my new idea.

P.S. In the first version of this post I included a graph showing votes given tweet shares between 40% and 60%. I intended this to illustrate the difficulty of predicting close elections, but my graph really missed the point, because the x-axis represented close elections in tweet shares, not in votes. So I crossed that part out. If nothing else, I’ve demonstrated the difficulty of thinking about this sort of analysis!

Foundation for Open Access Statistics

Now here’s a foundation I (Bob) can get behind:

Foundation for Open Access Statistics (FOAS)

Their mission is to “promote free software, open access publishing, and reproducible research in statistics.” To me, that’s like supporting motherhood and apple pie!

FOAS spun out of and is partially designed to support the Journal of Statistical Software (aka JSS, aka JStatSoft). I adore JSS because it (a) is open access, (b) publishes systems papers on statistical software, (c) has fast reviewing turnaround times, and (d) is free for authors and readers. One of the next items on my to-do list is to write up the Stan modeling language and submit it to JSS.

As a not-for-profit with no visible source of income, they are quite sensibly asking for donations (don’t complain — it beats $3K author fees or not being able to read papers).