Javier Benítez points to this article by epidemiologist Geoff Norman, who writes:
The nature of science was summarized beautifully by a Stanford professor of science education, Mary Budd Rowe, who said that:
Science is a special kind of story-telling with no right or wrong answers. Just better and better stories.
Benítez writes that he doesn’t buy this.
Neither do I, but I read the rest of Norman’s article and I really liked it.
Here’s a great bit:
I [Norman] turn to a wonderful paper by Cook et al. (2008) that described three fundamentally different kinds of research question:
1. Description
“I have a new (curriculum, questionnaire, simulation, OSCE method, course) and here’s how I developed it”
That’s not even poor research. It’s not research at all.
2. Justification
“I have a new (curriculum, module, course, software) and it works. Students really like it” OR “students self-reported knowledge was higher” OR “students did better on the final exam than a control group” OR even “students had lower mortality rates after the course”
OK, it’s research. But is it science? After all, what do we know about how the instruction actually works? Do we have to take the whole thing on board lock, stock and barrel, to get the effects? What’s the active ingredient? In short, WHY is it better? And that brings us to
3. Clarification
“I have a new (curriculum, module, course, software). It contains a number of potentially active ingredients including careful sequencing of concepts, imbedding of concepts in a problem, interleaved practice, and distributed practice. I have conducted a program of research where these factors have been systematically investigated and the effectiveness of each was demonstrated”.
That’s more like it. We’re not asking if it works, we’re asking why it works. And the results truly add to our knowledge about effective strategies in education. So one essential characteristic is that the findings are not limited to the particular gizmo under scrutiny in the study. The study adds to our general understanding of the nature of teaching and learning.
I don’t want to get caught up in a debate on what’s “science” or what’s “research” or whatever; the key point is that science and statistics are not, and should not, be, just about “what works” but rather “How does it work?” This relates to our earlier post about the problem with the usual formulation of clinical trials as purely evaluative, which leaves you in the lurch if someone doesn’t happen to provide you with a new and super-effective treatment to try out.
To put it another way, the “take a pill” or “black box” approach to statistical evaluation would work ok if we were regularly testing wonder-pills. But in the real world, effects are typically highly variable, and we won’t get far without looking into the damn box.
Norman also writes:
The most critical aspect of a theory is that, instead of addressing a simple “Does it work?,” to which the answer is “Yup” or “Nope”, it permits a critical examination of the effects and interactions of any number of variables that may potentially influence a phenomenon. So, one question leads to another question, and before you know it, we have a research program. Programmatic strategies inevitably lead to much more sophisticated understanding. And along the way, if it’s really going well, each study leads to further insights that in turn lead to the next study question. And each answer leads to further insight and explanation.
One interesting thing here is that this Lakatosian description might seem at first glance to be a good description, not just of healthy scientific research, but also of degenerate research programmes associated with elderly-words-and-slow-walking, or ovulation and clothing, or beauty and sex ratio, or power pose, or various other problematic research agendas that we’ve criticized in this space during the past decade. All these subfields, which have turned to be noise-spinning dead ends, feature a series of studies and publications, with each new result leading to new questions. But these studies are uncontrolled. Part of the problem there is a lack of substantive theories—no, vague pointing to just-so evolutionary stories is not enough. But another problem is statistical, the p-values and all that.
Now let’s put all these ideas together:
– If you want to engage in a scientifically successful research programme, your theories should be “thick” and full of interactions, not mere estimates of average treatment effects. That’s fine: all the areas we’ve been discussing, including the theories we don’t respect, are complex. Daryl Bem’s ESP hypotheses, for example: they were full of interactions.
– But now the next step is to model those interactions, to consider them together. If, for example, you decide that outdoor temperature is an important variable (as in that ovulation-and-clothing paper), you go back and include it in analysis of earlier as well as later studies. And if you’re part of a literature that includes other factors such as age, marital status, political orientation, etc., then, again, you include all these too.
– Including all these factors and then analyzing in a reasonable way (I’d prefer a multilevel model but, if you’re careful, even some classical multiple comparisons approach could do the trick) would reveal not much other than noise in those problematic research areas. In contrast, in a field where real progress is being made, a full analysis should reveal persistent patterns. For example, political scientists keep studying polarization in different ways. I think it’s real.
The point of my above discussion is to elaborate on Norman’s article by emphasizing the interlocking roles of substantive theory, data collection, and statistical analysis. I think that in Norman’s discussion, he’s kinda taking for granted that the statistical analysis will respect the substantive theory, but we see real problems when this doesn’t happen, in papers that consider isolated hypotheses without considering the interactions pointed to even within their own literatures.
P.S. Regarding the title of this post, here’s what Basbøll and I wrote a few years ago about science and stories. Although here we were talking not about storytelling but about scientists’ use of and understanding of stories.
“Papers that consider isolated hypotheses” also run the risk of “rigor distortis” errors. Of being narrowly rigorously correct but in almost all real cases broadly wrong or irrelevant. Beware imprudent use of the “all else equal” move. In many real situations you can’t keep all else equal.
See Garret Hardin’s First Law of Human Ecology: “We can never do merely one thing.”
Any seeming pill-like intervention has numerous effects and shifts many other factors.
A field that suffers especially badly from rigor distortis tendencies is economics.
see
Why There Are No “Unfailed” Markets In Reality
http://bigthink.com/errors-we-live-by/savvy-consumers-of-economic-ideas-know-how-to-spot-what-wolf-and-rigor-distortis-errors
If I may respond to your essay, Jag: for *this* economist, answering the questions “what is the implication of rational behavior” or “does the particular market under discussion look like the workings of rational behavior” and even “the behavior of this market might seem irrational, but given the incentives of the market participants, I’m probably thinking about it incorrectly… let’s see what I got wrong (other than the fact that people aren’t perfectly rational maximizers)” are both normatively and positively interesting questions. As to spectacularly wrong predictions, figuring out why those turned out differently may indeed be caused on some factor that one assumed would remain constant which did not. That’s no more something to be embarrassed about (assuming your original predictive claim was carefully couched) than one should be embarrassed to say: “Reagan will lose if the generalization holds true that divorced men are unelectable.” (Note saying: “Reagan will lose. No divorced man has ever been elected President” while qualitatively similar, ought to embarrass the predictor. Your critique seems to me to properly chastise the *confidence* of economists, but not their methodology. Social science is largely a series of uncontrolled experiments. It’s unsurprising that a lot of bad inferences are drawn. It’s human nature that some of those bad inferences are stated more confidently than the data and the theory warrant.
Jonathan:
It’s bugging me that you didn’t close your parenthesis so I’ll just do it here.
)
https://xkcd.com/859/
Another reason to implement an edit function…. or to get more anal commenters.
“Part of the problem there is a lack of substantive theories—no, vague pointing to just-so evolutionary stories is not enough.”
Fun fact. This volume:
http://www.cambridge.org/gb/academic/subjects/psychology/cognition/mapping-mind-domain-specificity-cognition-and-culture
played a seminal role in the rise of evolutionary psychology (see, e.g., ch. 4). The co-editor, Susan Gelman, is (I believe) Andrew’s sister.
What I think is missing from the perspective taken is recognition that description, justification, and clarification feed each other and evolve over fairly long periods of time. Working from rich theories to formulate and test new ideas is actually a late-stage condition that results from many iterations of a description-justification-clarification cycle.
Take electromagnetism as an example. The phenomena of magnetism and static electricity were known at least as far back as the ancient Greeks. Only in the past couple of centuries years did scientists begin to catalog and systematize those observations into a somewhat coherent compendium of known phenomena, phenomena that could be used to speculate on theories, and to develop instrumentation that could be used to test theories experimentally and generate new knowledge of less obvious facts and improve theories.
Even then, it took nearly a century to go from the early work of, say Faraday and Coulomb, to James Clerk Maxwell’s synthesis of electricity, magnetism, and light. And there were many false steps along the way. One could even say that Maxwell’s equations, although they neatly implied all the earlier known phenomena of electricity and magnetism, and predicted electromagnetic radiation, were a somewhat superficial “explanation.” They were still more phenomenological than explanatory. It took several more decades before the work of Einstein, Dirac, and others established that Maxwell’s equations synthesizing electricity and magnetism were the consequences of a much deeper theory of Lorentz invariance. And still more decades to develop modern quantum electrodynamics and quantum field theory. None of these great intellectual achievements would have been possible without the earlier “what happens if I do this?” kind of simple experimentation. And the development of instrumentation to test the limits of the principles of quantum field theory would not have been possible without the phenomenologic implications of earlier, less profound theories.
There is also some irony in that while modern physics has developed really deep insights into the underlying “why and how” of what we see in the world, the theories are so computationally intensive that, as a practical matter, they can be directly applied only to relatively simple and limited systems. The study of, for example, living organisms, though in principle reducible to the calculations of quantum field theory (and even just quantum electrodynamics), could never be carried out in that way, and would probably miss important insights at larger levels of organization of matter and energy if it could.
I think that the attempt to systematically study human behavior with a scientific approach didn’t really get underway until after World War II. I think it will take decades or centuries of just identifying and cataloging the phenomena that need to be explained by any good theory will be needed before non-trivial theories can begin to be formulated. Baby steps first.
This has been happening, and actually seems to have been much more common before WWII. I found one of my earlier posts responding to this common trope w/ some examples:
http://statmodeling.stat.columbia.edu/2017/07/20/nobel-prize-winning-economist-become-victim-bog-standard-selection-bias/#comment-530272
I really encourage people to expose themselves to the pre-NHST literature. It is a great guide for how to actually approach a problem scientifically (rather than through the lens of generations of NHST-guided thinking).
Yes, it’s all telling stories. Let me put it in terms of The Prestige.
The pledge
Newton formulates laws of gravity and now we can predict the positions of the planets and tides and build better cannons.
The turn
Observations violating Newton’s theory are found, showing it isn’t true, it’s only an approximation.
The prestige
It’s approximations all the way down, making science no more “truthful” than literature.
Cue ensuing postmodern ennui.
Cue Keith O’Rourke to explain the pragmatist escape hatch where the scientist is hiding to help mitigate that ennui.
I don’t think it’s possible even in principle to discover that it’s approximations all the way down. Something something something finite resources something something something algorithmic complexity theory.
Not really an escape hatch but rather a regulatory assumption to prevent the prestige – assuming without justification that appropriately clawing down will eventually get deep enough so no one would ever feel a need to ever dig further. This keeps at least some focused on getting the next good question of why?.
And why not?
You don’t need an exact theory. Just one accurate down to differences you can detect.
Using such a model in place of the exact theory will not make a difference to anything, because if it did that would be a way to detect a difference between the two.
Probably a bit cliche to make this argument but – I reckon more ‘bad science’ is probably done in the name of developing a ‘why’ explanation before a ‘what’ is actually solidly established than vice-versa.
I know I’ve definitely wasted time trying to ‘explain’ something that seems cool before taking the proper time to properly establish that it is in fact a real thing. You only realise there is no ‘what’ there when you actually return from story land.
It seems like many scientific phenomenon are in fact weird ‘shut up and calculate’ things that we later develop convenient stories about so that we can more easily remember them. Story agnosticism can be a good thing!
I am intrigued by this: http://www.wired.co.uk/article/nasa-validates-impossible-space-drive
I haven’t heard any updates on it… I wonder what in fact is the “what” here?
Daniel:
Hmm, from 2014, and check this out: “Either the results are completely wrong, or Nasa has confirmed a major breakthrough in space propulsion.”
A reporter should follow up on this one; it could be a fun story. I’m guessing it didn’t work out, as if it’s been 4 years since a major breakthrough in space propulsion, I’d think I’d’ve heard more about it. But who knows?
I think it’s one of those things where the anomaly is smallish and so it’s hard to measure and be sure that you haven’t got some systematic problem causing the measurement.
I did a google news search and there’s been some activity still, so I don’t think this is totally debunked or confirmed yet:
https://www.google.com/search?q=emdrive
Thanks Daniel. Its been months since you sent me down a google rabbit hole that destroyed an evening’s worth of potential productivity. And an hour this morning.
Damnit Daniel, on this blog we obey the laws of thermodynamics!
https://www.youtube.com/watch?v=Dc-m9dumEaw
I’m not sure how much journalism is going into it, but at the very least this topic is being actively discussed probably daily on sites like Quora.
Interestingly? I was on an active email discussion headed by https://en.wikipedia.org/wiki/David_Sackett in the 1990’s that hotly debated the “Is” versus “Why” ordering with David and most siding with “Is” before “Why”.
A minority of us though argued fro “Why” before “Is”. That was because it was being discussed in terms of epidemiological studies (not randomized) where most “Is”s are apparent rather than real (due to confounding or selection etc.) Even non-“Is”s where confounding cancel real “Is”s often is apparent rather than real. Given the McMaster and epidemiology connection Geoff Norman might have been in that email group.
Put in a wider context of economy of research, expediting organised experience to hasten getting less going (going down through levels of turtles) http://statmodeling.stat.columbia.edu/2017/11/29/expediting-organised-experience-statistics/ – a randomized study to assess an “Is” has its place. Are you far enough down levels of turtles to be connected enough with that reality beyond direct access to actual make something, anything, repeatedly happen?
Of course this is science coaching rather than science doing – https://www.goodreads.com/quotes/18027-some-people-see-things-that-are-and-ask-why-some
> expediting organised experience to hasten getting less going
getting less wrong :-(