(scheduled to appear in a few months, of course).
I think you’ll like it. Or hate it. Depending on who you are.
(scheduled to appear in a few months, of course).
I think you’ll like it. Or hate it. Depending on who you are.
Thomas Leeper points me to Diederik Stapel’s memoir, “Faking Science: A True Story of Academic Fraud,” translated by Nick Brown and available online for free download.
Under the heading, “Results too good to be true,” Lee Sechrest points me to this discussion by “Neuroskeptic” of a discussion by psychology researcher Greg Francis of a published (and publicized) claim by biologists Brian Dias and Kerry Ressler that “Parental olfactory experience [in mice] influences behavior and neural structure in subsequent generations.” That’s a pretty big and surprising claim, and Dias and Ressler support it with some data: p=0.043, p=0.003, p=0.020, p=0.005, etc.
Francis’s key grounds for suspicion is that Dias and Ressler in their paper present 10 successful (statistically significant) results in a row, and, given the effect sizes they estimated, it would be unlikely to see such an unbroken string of successes.
Dias and Ressler replied that they did actually report negative results:
While we wish that all our behavioral, neuroanatomical, and epigenetic data were successful and statistically significant, one only need look at the Supporting Information in the article to see that data generated for all four figures in the Supporting Information did not yield significant results. We do not believe that nonsignificant data support our theoretical claims as suggested.
Francis followed up:
The non-significant effects reported by Dias & Ressler were not characterised by them as being “unsuccessful” but were either integrated into their theoretical ideas or were deemed irrelevant (some were controls that helped them make other arguments). Of course scientists have to change theories to match data, but if the data are noisy then this practice means the theory chases noise (and the findings show excess success relative to the theory).
I would also like to say that it’s probably not a good idea for Dias and Ressler to wish that all their data are “successful and statistically significant.” With small samples and small effects, this just isn’t gonna happen—indeed, it shouldn’t happen. Variation implies that not every small experiment will be statistically significant (or even in the desired direction), and I think it’s a mistake to define “success” in this way.
Do a large preregistered replication
In any case, the solution here seems pretty clear to me. Do a large preregistered replication. This is obvious but it’s not clear that it’s really being done. For example, in a news article from 2013, Virginia Hughes describes the research in question as “tantalizing” and that “other researchers seem convinced . . . neuroscientists, too, are enthusiastic about what these results might mean for understanding the brain,” and she talks about further research (“A good next step in resolving these pesky mechanistic questions would be to use chromatography to see whether odorant molecules like acetophenone actually get into the animals’ bloodstream . . . First, though, Dias and Ressler are working on another behavioral experiment. . . . Scientists, I have to assume, will be furiously working on what that something is for many decades to come . . .”) but I see no mention of any plan for a preregistered replication.
I’d like to see a clean, pure, large, preregistered replication such as Nosek, Spies, and Motyl did in their “50 shades of gray” paper. I recognize that this costs time, effort, and money. Still, replication in a biological study of mice seems so much easier than replication in political science or economics, and it would resolve a lot of statistical issues.
Aki Vehtari, Pasi Jylänki, Christian Robert, Nicolas Chopin, John Cunningham, and I write:
We revisit expectation propagation (EP) as a prototype for scalable algorithms that partition big datasets into many parts and analyze each part in parallel to perform inference of shared parameters. The algorithm should be particularly efficient for hierarchical models, for which the EP algorithm works on the shared parameters (hyperparameters) of the model.
The central idea of EP is to work at each step with a “tilted distribution” that combines the likelihood for a part of the data with the “cavity distribution,” which is the approximate model for the prior and all other parts of the data. EP iteratively approximates the moments of the tilted distributions and incorporates those approximations into a global posterior approximation. As such, EP can be used to divide the computation for large models into manageable sizes. The computation for each partition can be made parallel with occasional exchanging of information between processes through the global posterior approximation. Moments of multivariate tilted distributions can be approximated in various ways, including, MCMC, Laplace approximations, and importance sampling.
I love love love love love this. The idea is to forget about the usual derivation of EP (the Kullback-Leibler discrepancy, etc.) and to instead start at the other end, with Bayesian data-splitting algorithms, with the idea of taking a big problem and dividing it into K little pieces, performing inference on each of the K pieces, and then putting them together to get an approximate posterior inference.
The difficulty with such algorithms, as usually constructed, is that each of the K pieces has only partial information; as a result, for any of these pieces, you’re wasting a lot of computation in places that are contradicted by the other K-1 pieces.
This sketch (with K=5) shows the story:
We’d like to do our computation in the region of overlap.
And that’s how the EP-like algorithm works! When performing the inference for each piece, we use, as a prior, the cavity distribution based on the approximation to the other K-1 pieces.
Here’s a quick picture of how the cavity distribution works. This picture shows how the EP-like approximation is not the same as simply approximating each likelihood separately. The cavity distribution serves to focus the approximation in the zone of inference of parameter space:
But the real killer app of this approach is hierarchical models, because then we’re partitioning the parameters at the same time as we’re partitioning the data, so we get real savings in complexity and computation time:
EP. It’s a way of life. And a new way of thinking about data-partitioning algorithms.
I hate when that happens. Demography is tricky.
Oh well, as they say in astronomy, who cares, it was less than an order of magnitude!
Palko tells a good story:
One of the accepted truths of the Netflix narrative is that CEO Reed Hastings is obsessed with data and everything the company does is data driven . . .
Of course, all 21st century corporations are relatively data-driven. The fact that Netflix has large data sets on customer behavior does not set it apart, nor does the fact that it has occasionally made use of that data. Furthermore, we have extensive evidence that the company often makes less use of certain data then do most other competitors. . . .
I can’t vouch for the details here but the general point, about what it means to be “data-driven,” is important.
Mon: “Now the company appears to have screwed up badly, and they’ve done it in pretty much exactly the way you would expect a company to screw up when it doesn’t drill down into the data.”
Tues: Expectation propagation as a way of life
Wed: I’d like to see a preregistered replication on this one
Thurs: A key part of statistical thinking is to use additive rather than Boolean models
Fri: Defense by escalation
Sat: Sokal: “science is not merely a bag of clever tricks . . . Rather, the natural sciences are nothing more or less than one particular application — albeit an unusually successful one — of a more general rationalist worldview”
Sun: It’s Too Hard to Publish Criticisms and Obtain Data for Replication
In a unit about the law of large numbers, sample size, and margins of error, I used the notorious beauty, sex, and power example:
A researcher, working with a sample of size 3000, found that the children of beautiful parents were more likely to be girls, compared to the children of less-attractive parents.
Can such a claim really be supported by the data at hand?
One way to get a sense of this is to consider possible effect sizes. It’s hard to envision a large effect; based on everything I’ve seen about sex ratios, I’d say .005 (i.e., one-half of one percentage point) is an upper bound on any possible difference in Pr(girl) comparing attractive and unattractive parents.
How big a sample size do you need to measure a proportion with that sort of accuracy? Since we’re doing a comparison, you’d need to measure the proportion of girls within each group (attractive or unattractive parents) to within about a quarter of a percentage point, or .0025. The standard deviation of a proportion is .5/sqrt(n), so we need to have roughly .5/sqrt(n)=.0025, or n=(.5/.0025)^2=40,000 in each group.
So, to have any chance of discovering this hypothetical difference in sex ratios, we’d need at least 40,000 attractive parents and 40,000 unattractive—a sample of 80,000 at an absolute minimum.
What the researcher actually had was a sample of 3000. Hopeless.
You might as well try to weld steel with a cigarette lighter.
Or do embroidery with a knitting needle.
OK, they’re not the best analogies ever. But I avoided sports!
Jonathan Falk asks what I think of this animated slideshow by Matthew Klein on “How Americans Die”:
Please click on the above to see the actual slideshow, as this static image does not do it justice.
What do I think? Here was my reaction:
It is good, but I was thrown off by the very first page because it says that it looks like progress stopped in the mid-1990s, but on the actual graphs, the mortality rate continued to drop after the mid-1990s. Also the x-axis labeling was confusing to me, it took awhile for me to figure out that the numbers for the years are not written at the corresponding places on the axes, and I wasn’t clear on what the units are on the y-axis.
I guess what I’m saying is: I like the clever way they tell the story. It’s a straightforward series of graphs but the reader has to figure out where to click and what to do, which makes the experience feel more like a voyage of discovery. The only thing I didn’t like was some of the execution, in that it’s not always clear what the graphs are exactly saying. It’s a good idea and I could see it as a template for future graphical presentations.
It’s also an interesting example because it’s not just displaying data, it’s also giving a little statistics lesson.
Hype can be irritating but sometimes it’s necessary to get people’s attention (as in the example pictured above). So I think it’s important to keep these two things separate: (a) reactions (positive or negative) to the hype, and (b) attitudes about the subject of the hype.
Overall, I like the idea of “data science” and I think it represents a useful change of focus. I’m on record as saying that statistics is the least important part of data science, and I’m happy if the phrase “data science” can open people up to new ideas and new approaches.
Data science, like any just about new idea you’ve heard of, gets hyped. Indeed, if it weren’t for the hype, you might not have heard of it!
So let me emphasize, that in my criticism of some recent hype, I’m not dissing data science, I’m just trying to help people out a bit by pointing out which of their directions might be more fruitful than others.
Yes, it’s hype, but I don’t mind
Phillip Middleton writes:
I don’t want to rehash the Data Science / Stats debate yet again. However, I find the following post quite interesting from Vincent Granville, a blogger and heavy promoter of Data Science.
I’m not quite sure if what he’s saying makes Data Science a ‘new paradigm’ or not. Perhaps it is reflective of something new apart from classical statistics, but then I would also say so of Bayesian analysis as paradigmatic (or at least a still budding movement) itself. But what he alleges – i.e that ‘Big Data’ by its very existence necessarily implies that cause of a response/event/observation can be ascertained, and seemingly w/o any measure of uncertainty, seems rather ‘over-promising’ and hypish.
I am a bit concerned with what I’m thinking he implies regarding ‘black box’ methods – that is the blind reliance upon them by those who are technically non-proficient. I feel the notion that one should always trust ‘the black box’ is not in alignment with reality.
He does appear to discuss dispensing with p-values. In a few cases, like SHT, I’m not totally inclined to disagree (for reasons you speak aobut frequently), but I don’t think we can be quite so universal about it. That would pretty much throw out most every frequentist test wrt to comparison, goodness-of-fit, what have you.
Overall I get the feeling that he’s implying the ‘new’ era as one of solving problems w/ certainty, which seems more the ideal than the reality.
What do you think?
OK, so I took a look at Granville’s post, where he characterizes data science as a new paradigm “very different, if not the opposite of old techniques that were designed to be implemented on abacus, rather than computers.”
I think he’s joking about the abacus but I agree with this general point. Let me rephrase it from a statistical perspective.
It’s been said that the most important thing in statistics is not what you do with the data, but, rather, what data you use. What makes new statistical methods great is that they open the door to the use of more data. Just for example:
- Lasso and other regularization approaches allow you to routinely thrown in hundreds or thousands of predictors, whereas classical regression models blow up at that. Now, just to push this point a bit, back before there was lasso etc., statisticians could still handle large numbers of predictors, they’d just use other tools such as factor analysis for dimension reduction. But lasso, support vector machines, etc., were good because they allowed people to more easily and more automatically include lots of predictors.
- Multiple imputation allows you to routinely work with datasets with missingness, which in turn allows you to work with more variables at once. Before multiple imputation existed, statisticians could still handle missing data but they’d need to develop a customized approach for each problem, which is enough of a pain that it would often be easier to simply work with smaller, cleaner datasets.
- Multilevel modeling allows us to use more data without having that agonizing decision of whether to combine two datasets or keep them separate. Partial pooling allows this to be done smoothly and (relatively) automatically. This can be done in other ways but the point is that we want to be able to use more data without being tied up in the strong assumptions required to believe in a complete-pooling estimate.
And so on.
Similarly, the point of data science (as I see it) is to be able to grab the damn data. All the fancy statistics in the world won’t tell you where the data are. To move forward, you have to find the data, you need to know how to scrape and grab and move data from one format into another.
On the other hand, he’s wrong in all the details
But I have to admit that I’m disturbed on how much Granville gets wrong. His buzzwords include “Model-free confidence intervals” (huh?), “non-periodic high-quality random number generators” (??), “identify causes rather than correlations” (yeah, right), and “perform 20,000 A/B tests without having tons of false positives.” OK, sure, whatever you say, as I gradually back away from the door. At this point we’ve moved beyond hype into marketing.
Can we put aside the cynicism, please?
Why some people don’t see the unfolding data revolution?
They might see it coming but are afraid: it means automating data analyses at a fraction of the current cost, replacing employees by robots, yet producing better insights based on approximate solutions. It is a threat to would-be data scientists.
Ugh. I hate that sort of thing, the idea that people who disagree with you, do so out of corrupt reasons. So tacky. Wake up, man! People who disagree with you aren’t “afraid of the truth,” they just have different experiences than yours, they have different perspectives. Your perspective may be closer to the truth—as noted above, I agree with much of what Granville writes—but you’re a fool if you so naively dismiss the perspectives of others.
Continue reading ‘Don’t, don’t, don’t, don’t . . . We’re brothers of the same mind, unblind’ »