Skip to content
 

No guru, no method, no teacher, Just you and I and nature . . . in the garden. Of forking paths.

Here’s a quote:

Instead of focusing on theory, the focus is on asking and answering practical research questions.

It sounds eminently reasonable, yet in context I think it’s completely wrong.

I will explain. But first some background.

Junk science and statistics

They say that hard cases make bad law. But bad research can make good statistics. Or, to be more precise, discussion of bad research can lead to good statistical insights. During the past decade, examples of bad science such as beauty-and-sex-ratio, ESP, ovulation-and-clothing, etc., have made us more aware of the importance of type M and type S errors in understanding statistical claims, the importance of the garden of forking paths in understanding where statistical significant results are coming from, and the role of prior information in data analysis. (Yes, I’d written a whole book on Bayesian data analysis but I’d not realized the useful role that direct prior information can play in practical inference.) Or, to consider another theme of this blog: years of discussion of bad graphs made us aware of the different goals of statistical communication.

The general idea is that, when we see problems in statistical and communication, we use the disconnect between observed practice and our ideals to gain insight into research goals. Theoretical statistics is the theory of applied statistics, and we can make progress by observing how statistics is actually applied.

We had an example recently, with two long discussions of the work of Brian Wansink, a Cornell University business school professor and self-described “world-renowned eating behavior expert for over 25 years.”

It started with an experiment done by Wansink that he himself characterized as a “failed study which had null results”—but which he then published four different papers on, with each paper presenting the experiment not as a failure but as a success. Perhaps one outside the world of food science would’ve heard about these papers had Wansink not boldly written about them on his blog, in a post where he openly advertises his p-hacking:

I [Wansink] had three ideas for potential Plan B, C, & D directions (since Plan A had failed). . . . Every day she [Wansink’s colleague] came back with puzzling new results, and every day we would scratch our heads, ask “Why,” and come up with another way to reanalyze the data with yet another set of plausible hypotheses. Eventually we started discovering solutions that help up regardless of how we pressure-tested them.

I was curious and looked up the papers in question, and, indeed, they sliced and diced their data in different ways to come up with statistical significance. The data were all from the same experiment but different analyses used different data-exclusion rules and controlled for different variables.

Following up, Tim van der Zee​, Jordan Anaya​, and Nicholas Brown looked into those four papers in even more detail and found over 100 errors in there. Basically, just about none of the numbers made sense. Also, a blog commenter pointed out that Wansink had told written two contradictory things about how his collaborator got involved in the project (see P.P.S. at my above-linked post).

So far, so typical. Low-quality research, noise mining, sloppiness, it’s Psych Science minus the psychology, or PPNAS minus the himmicanes. Run-of-the-mill, everyday, bread-and-butter junk science. PhD’s on the hamster wheel, going in circles, releasing publications and press releases and going on NPR and Ted, 9 to 5, Monday through Friday, until retirement. With all the errors and contradictory stories, this is maybe a bit worse than normal and it raises the question of whether there is any deliberate dishonesty going on, but the overall picture is of data being put into a meat grinder and being published as mass-produced hamburgers. Nothing interesting to report.

So far, the only really notable thing is Wansink’s openness about all of this. In the psychology department they know enough to realize that you’re not supposed to p-hack, that there is such a thing as research protocol, and that churning out papers is not supposed to be a goal in itself. Wansink’s overt description of his research process indicates that this understanding has not yet made it all the way to Cornell business school.

It’s a paradox. On one hand, Wansink would’ve been better off keeping his head down and not telling the world about his workflow; on the other hand, publicity is one of his legitimate goals. After all, if you’re doing food research and you think your research is high quality—if you think you actually are making discoveries—then you do want to publicize your findings, as they can make a difference in the world.

As the saying goes: You may not be interested in bad research, but bad research is interested in you.

No theory

Three statistical issues came out in our blog discussions. The first was that Wansink and his colleagues engage in what are known as “questionable research practices” which invalidate the statistical conclusions which are the basis of those articles getting published in peer-reviewed journals. The second was that they were, at best, extremely sloppy in publishing work with so many errors and contradictions. The third was that Wansink explicitly works with no substantive theory.

The first two problems are nothing new; they’re part of the standard playbook of Psychological Science or PPNAS-type research: hyped claims based on noisy data, messy data manipulation, and a general attitude that once a paper is published, it should be immune from criticism. The usual Ted-talk attitude.

The third item is interesting, though. Let’s again pass the mic over to Wansink:

Instead of focusing on theory, the focus is on asking and answering practical research questions.

That sounds good—who among us does not prefer empirics to theory?—but is missing a key step. You don’t just want to ask and answer questions, you also want those answers to be correct, to give insight, ultimately to give good predictions.

If you have no theory and the ability to produce noisy data, you can ask all the questions you want (ok, actually I have some doubts about the quality of the questions that will get asked in the absence of theory), and if you’re willing to sift through your data enough you can get “p less than .05” answers, but there’s no reason to expect these answers will be any more useful than what you’d get just by flipping coins.

With noisy data, in the absence of theory, effect sizes will be low, and anything statistically significant is likely to be a huge overestimate of any effect and also likely to be in the wrong direction (that’s type M and type S errors).

Why am I so sure that effect sizes will be low in the absence of theory? Because there are just too many things to look at. Without theory (or effective intuition or heuristics, which are just informal versions of theory), you’re basically picking potential effects at random, and most potential effects are small.

Kurt Lewin wasn’t kidding when he said, “There’s nothing so practical as a good theory.”

OK, maybe he was kidding. I have no idea. I know nothing about Kurt Lewin. Perhaps I could ask Karl Weick to tell me some stories about Lewin, next time I’m in Ann Arbor.

Our bad

As Emilio Estevez never said, I blame society. More specifically, I blame the statistics profession for contributing to the mistaken attitudes of people such as Wansink. For decades we’ve been telling people that statistics can reject the null hypothesis in the absence of substantive theory. So it makes sense that these dudes will believe us!

I remember in grad school our professor patiently explaining to us the magic of random assignment, that you can demonstrate the existence of a treatment effect, and accurately estimate its magnitude, without any substantive theory at all. What he didn’t tell us was that these methods fall apart when effect sizes are small and noise is high.

And, hey, what happens if you have no theory?

1. Effect sizes tend to be small. With no theory, the plan is to stumble onto effects, not to search them out.

2. Noise tends to be high. With no theory of measurement, it can be a challenge to measure well.

It’s worse than that, actually: in the presence of uncontrolled researcher degrees of freedom, where “p less than .05” is so easy to attain that a research team can produced four published papers from a single failed experiment, there’s not really any motivation to measure anything accurately. Indeed, in many ways, noisy measurements are a plus for an ambitious researcher: When standard errors are high, statistically significant results will be automatically large, thus more dramatic, better headlines, more impressive graphs for your PPNAS papers and Ted talks, and so on.

I’m not saying that anyone’s making their measurements extra-noisy on purpose; it’s just that the incentives favor noisy measurements.

As a wise economist once said, people don’t always respond to incentives, but responding to incentives is usually a lot easier than not responding to incentives.

In the garden

In comments, Thomas Basbøll discussed Wansink’s anti-theoretical attitude in the context of a beautiful Van Morrison song. (As an aside, Basbøll’s invocation of Morrison was a great move, because now when I write on this topic, I have that song running pleasantly in the background in my head.) Basbøll writes:

In my adaptation of Morrison’s slogan, I’ve only replaced the “guru” with “theory”. It reminds me of Bertrand Russell’s observation that sometimes a system of logical notation can bring insights as good as a live teacher. . . .

Academic knowledge is the sort of thing we can learn from others. That’s what makes an education something quite different than a spiritual journey. We’re not just supposed to find the answers within ourselves (though we may find many of them there while attending a university); we’re supposed to be brought up to speed about what the culture already knows.

A “scientific” discovery, likewise, is one we can teach to others, it is “contribution” to others, especially other researchers. That’s why theory is so important. It’s what you are contributing a particular result to. In science, you can’t really claim to answer “important questions” instead of extending or testing a theory. It’s the theory that gives the question its importance.

For Wansink to present an “experimental” approach to economic behavior with no theory is as odd as if he proposed to conduct his experiments with no method.

Before concluding this post, I should emphasize that theory isn’t perfect. Some theories or frameworks are flat-out wrong; others are useless; others were once useful but are now played out. To get a sense of how theory can lead one astray, look at the career of sociologist Satoshi Kanazawa, famous for his indefatigable attempts to squeeze statistical blood out of the dry stone which is N=3000 sex-ratio data. His (and others’) misunderstanding of statistics led him to publish claims which were essentially pure noise, but his attachment to a particular theory has, I fear, kept him going, nourishing his confidence in settings where a better response would’ve been to quit.

So, sure, I’m aware that theory can only go so far, and we need to be open to the unexpected, to learning new things from data.

But, remember, we can best learn from the unexpected when we carefully specify what is the “expected” that the world deviates from.

In some way, this comes down to technical issues in statistical modeling. Here’s Wansink again:

With field studies, hypotheses usually don’t “come out” on the first data run. But instead of dropping the study, a person contributes more to science by figuring out when the hypo worked and when it didn’t. This is Plan B. Perhaps your hypo worked during lunches but not dinners, or with small groups but not large groups. . . .

I don’t actually object to any of this. But the way to study these interactions is not to sift through looking for “p less than .05”: that’s a procedure with poor frequency properties, as we say in statistics: it’s a way to produce overconfident overestimates (high type M and type S errors). Instead, I recommend multilevel modeling, partially pooling interactions toward zero. When effects are small and measurement error is high, as in Wansink’s experiment, just about everything will be pooled toward zero.

But that’s ok. At least, it’s ok if your goal is to learn about the world. It’s not so great if your goal is to produce a stream of publications claiming statistically significant discoveries.

P.S. Just to be clear: I’m not saying that all bad research points us to statistical insights. I don’t think we got anything useful at all from discussing that himmicanes paper, for example.

P.P.S. In my criticism of Wansink’s research, I’m not saying that he’s doing more harm than good from this work. I can see strong arguments in both directions, and this will be the subject of tomorrow’s post.

34 Comments

  1. D Kane says:

    This, and your related blog posts, are just genius. Any thoughts to putting them together in a 10-20 page paper, for which the target audience is undergraduate students in their first statistics course?

    • Rahul says:

      Undergrads in a 101 course read 20-page articles?

      • If written in the right way, yes. Typical articles are written with the purpose of buffaloing everyone else and making the author seem ultra intelligent and Nobel Worthy. Undergrads don’t read those. Of course, to be honest, neither does anyone else.

        • Keith O'Rourke says:

          > neither does anyone else.
          Except those on faculty review/promotion committees as well as University grant facilitators and public relations staff.

        • Rahul says:

          For my work I switch between reading articles in academic journals vs trade magazines. I’m struck by the start contrast in clarity of communication.

          It is almost as if the entire academic article format itself is flawed from the start. The key goal is to assign credit and claim terrain and less to communicate a result.

          The fonts, the walls of unbroken text, the undigestibly long formats. The breaks in reading continuity by citation alphabet-soup. The practice of putting figures / tables at the end in review drafts. Referring to content very remote from the page being read etc.

          I do understand that some of these quirks did have legitimate legacy reasons but a housecleaning is long overdue.

          • Keith O'Rourke says:

            As CS Peirce wrote a long time ago about Phd theses – “they are not written to clearly communicate important ideas but rather to impress upon the reader that the writer deserves to be granted the degree applied for.”

            Maybe its become a career long requirement for many “to impress upon the reader that the writer deserves retention, promotion, more grants, publicity in University communications,etc. etc.”

          • Chris J says:

            I agree the problem is not so much science as it is science communications. I have previously commented to Andrew on this blog about the need for standards. Accountants only established standards after the Great Crash of 1929. Actuaries(my professional and corporate experience) started to establish standards in 1985. Andrew was not keen on what he called the institutional solution, but my argument was intended to address standards of communication of results, not standards of research practice(not my field).
            The first rule of actuarial disclosure communication is “Uncertainty and Risk – the actuary should consider what cautions regarding possible uncertainty or risk in any results should be included in the actuarial report.” [ASOP No. 41 3.4.1]
            The seventh rule is “Responsibility to Other Users—An actuarial document may be used in a way that may influence persons who are not intended users. The actuary should recognize the risks of misquotation, misinterpretation, or other misuse of such a document and should take reasonable steps to ensure that the actuarial document is clear and presented fairly. To help prevent misuse, the actuary may include language in the actuarial document that limits its distribution to other users (for example, by stating that it may only be provided to such parties in its entirety or only with the actuary’s consent).”
            Item 7 would need to be tailored to the needs of the scientific community, but you can see that actuaries as a profession have a standard that says you can not just ignore the fact that the popular press will pick up your study and run with it as is.

            I am amazed that a scientific research report is considered complete when a startling conclusion is presented without detailed description of different ways the conclusions could be in error or could be misinterpreted. “More research is needed..” often comes across as an expectation of further supporting studies, but skeptical research is by nature more valuable.

            Good standards for science research communications would not limit what scientists currently do, but would expand the formal communications to detail about what could be erroneous or missing, such as mechanisms that were not evaluated due to data limitations.

            Such standards would help address researchers who “know the conclusion”, but just need to crunch the data to back it up or who miss identifying mechanisms that others may find more compelling.

            Finally, if research papers included such risks of results and limitations of interpretations for other users as a matter of course, the quality of the paper would be judged, in part, on the quality of the cautions and the usefulness of those communications to inform further scientific research and not just on those startling conclusions. In other words, the best scientific papers would identify to other scientific researchers what problems they are most likely to find in the paper. Instead of defensively batting away skeptics, the researcher would have been the one who first identified the potential problem.

            • Andrew says:

              Chris:

              You write, “I am amazed that a scientific research report is considered complete when a startling conclusion is presented without detailed description of different ways the conclusions could be in error or could be misinterpreted.”

              But it’s worse than that! Consider the following sentence which concluded the abstract of perhaps the most notorious paper ever published in the notorious journal Psychological Science:

              That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.

              The paper in question had no measures of “becoming more powerful.” None. Nada. Zero. Zip. I’m not saying they they didn’t qualify their claims enough, or that they had a noisy measure of power, or that they measured it but I don’t believe it’s really statistically significant, or that I dispute their theoretical framework, or that I disagree with their data exclusion rules, or that they miscalculated a t-statistic. I’m saying that in their paper they never even claimed to measure the power (or powerfulness, or whatever you want to call it) of people in their study.

              This is pretty much junk science in its purest form. You say you discovered something that you never even studied.

              And, remember, this paper was published in what is arguably the leading journal of psychology research by faculty at leading universities, and of course this work was hyped by many of the leading institutions of our scholarly and media culture, including Harvard University, Ted, NPR, and just about every leading news organization.

              • Chris J says:

                Andrew,
                My suggestion is that a good starting point is a formal body that proposes a detailed formal standard of communication of scientific results. If things are as bad as you say, then some starting point is needed beyond exhortations. Not to beat the drum too much, but actuarial standards require disclosure where the actuary deviates from the standards. The rules are so light that you do not have to follow them, but if you do not, then to be a “good actuary” you need to disclose where you did not follow the standard and explain why(4.4). As to Ted and NPR, that is my point above – the original research paper would caution that results can not accurately be portrayed without including certain cautions as specified in the “good” research paper. Harvard University representatives would presumably participate. I did not mention technical credentials, but based on my past reading of your posts, a special “technical” peer review for statistical integrity may be necessary.
                Are you saying things are so bad that something like this could not be structured in a way that would succeed?
                I say divide and conquer. Fix the academic side first. Then go for the journalists who may need a few new standards of reporting themselves. None of this would be regulatory, but if the right stakeholders can get together and reach agreement, and there is opportunity for comment from users, responsible researchers would need to at least think twice before deviating from the formal standards. Below is the actuarial standard of practice on communications (which I hesitate to include because there are more differences than similarities and the analogy could get lost). Sec. 3.4.1, Sec. 3.7, and Sec. 4.4(Deviation from Guidance) are the relevant sections among others. You could almost mark this up and have an initial draft. Am I too optimistic?

        • Andrew says:

          Dan:

          Buffalo buffalo buffalo buffalo.

    • Martha (Smith) says:

      The ideas need to be integrated into the textbook and the process of teaching the course. (And then there is the question of promoting such a course, which would be considered “nonstandard” by many who teach such courses.)

      • Keith O'Rourke says:

        Yup and then there is the other faculty in the university to deal with.

        I remember running into this at Duke – the students were noticing that what I was arguing was poor research practices resembled much of what their faculty in their primary departments were doing in their published research papers…

        Of course, that is exactly why it would be worth doing.

        • Nick says:

          In the various Facebook and other forums dedicated to open science, one regularly sees grad students who come along with “Hi, I love all this stuff and I want to write an honest, open paper. But my supervisor tells me to HARK, p-hack, and then when I’ve finished doing all the data collection, run the analyses, and written the draft, he’ll put his name on it, as well as the names of a couple of his buddies. What should I do?”. And the consensus answer is always, “Do what you have to do to survive”, as if someone who survives this way is suddenly going to cut their publishing productivity by 80% by being honest any time this side of tenure.

          It’s not clear to me how the open science movement is going to get to critical mass in the face of (to borrow a phrase from the former UK Education Minister, Michael Gove, describing the resistance that he encountered to his school reforms from teachers and their unions) “The Blob”.

          • Anoneuoid says:

            “Do what you have to do to survive”

            I really don’t understand why this response is so popular, although I agree that is the de facto answer. Imo, it is definitely better to quit than waste your life producing misleading research reports.

            How could someone who (presumably) wanted to do science, not be constantly miserable as a cog in a machine they admit is working towards the opposite goal? The pay isn’t very good either. I don’t get it.

  2. Anoneuoid says:

    Instead of focusing on theory, the focus is on asking and answering practical research questions.

    1) There is no reason to think your results will extrapolate beyond your sample (into the future, slightly changed conditions).
    2) If using a series of statistical significance determinations to “answer practical research questions” was at all reasonable, we would see machine learning algorithms do this. I don’t think that exists (or that it would work… at all).

    Basically, you should either be searching for “universal” laws (eg the learning curve for this task is always sigmoid, except under conditions A, B, C), or using predictive skill on train/validation/test sets to choose some kind of “pattern-recognition” model. The NHST (+ p-hacking) approach described here has no place in a productive research community. It can be used to publish an endless stream of misleading reports, that is about it.

  3. Tom Passin says:

    I have found that running simulations – a lot of them – is very helpful for giving me a feel for when and how data analyses can go wrong. Simple simulations are mostly what are called for. For example, set up a simple linear least squares fit with two variables: y = ax + bz. create some values for x and z. Compute y, then add noise.

    Now try to fit the y data to recover the a and b parameters. Preferably, plot the fitted and actual lines of y vs x, y vs z with the other parameter held constant.Add some noise to the *x* values, simulating unknown or unexpected uncertainties. And watch as that noise gets reflected onto the z axis and adds just as much extra uncertainty onto the b parameter. Next try to understand how that could happen. Keep doing this.

    It’s very enlightening. You learn when to be humble about the results.

    • Neil says:

      Do you have sample R code for this?

      • x = rnorm(100,0,1);
        z = runif(100,0,1);
        y = 2*x + 3*z;

        lm(y ~ x+z,data=data.frame(x=x,y=y,z=z))

        xm = x + rnorm(100,0,.2);

        lm(y ~ x+z,data=data.frame(x=xm,y=y,z=z))

        ….

      • Tom Passin says:

        Neil asks:
        Do you have sample R code for this?

        I used Python. I could make it available if you like. You need python 2.7, and the python library matplotlib for plotting. It’s more complex than necessary because I wanted to get some of the intermediate sums, so I compute the solutions in the program instead of using a library like numpy to do so.

        To get the most out of the program, you have to change the noise setup (e.g., noise on y only, noise on x only, etc), which is only a simple edit to the file. It’s a bit crude but makes nice plots that show you clearly what happens when you fuss with the noise or the range of the data. For example, if you bunch up all the z-axis points into a short range, then the same amount of noise translates to a much larger uncertainty about the z-only slope.

    • Tim Morris says:

      Good to read this Tom – I do this sort of thing all the time.
      Especially when there’s other stuff to do.

  4. Andrew: “I don’t think we got anything useful at all from discussing that himmicanes paper, for example.”

    I personally found it useful, as a Civil Engineering PhD my focus was on what should we actually expect for natural disasters? How could we predict such things? Some minor amount of thinking on the subject and I produced what I thought was a clever dimensionless ratio on which to base such analyses. I didn’t spend much time on it and I suspect my Bayesian followup statistical model has bugs, but the dimensionless ratio stuff made good sense.

    http://models.street-artists.org/2014/06/19/hurricanes-vs-himmicanes-a-little-modeling-goes-a-long-way/

  5. Jonathan says:

    Sorry. This became TL/DR.

    Isn’t one of the question you really ask in this blog whether some areas, some “disciplines” are resistant to rigorous analysis? You often discuss different versions of this: the complete absurdities of himmicanes and esp to small studies that might if done well maybe might show something other than noise. In general, the “studies” of complete absurdities show that the theory which says “none of this stuff is true” is correct.

    But my quibble with this post is that it’s entirely correct except that it kind of assumes “spherical cows”, with those being a form of rationality within context. I’m not talking about irrationality, whatever that exactly means, but that the universe of things which can be analyzed with rigor is small compared to all the things that we need to analyze to proceed. We’ve all been involved in huge number of decisions which are guesses. You can say to yourself that it’s important to “be true” to principle or to “follow the gut”, whatever that means, and you can add in what looks like fact, like meaningful data, but which typically is just a slice of something bigger that you can’t see in the present context. (Of course, that’s easier to see in the kind of limited universe of a studies that have some observable metric – though those observations and the metric itself may be problematic – and it’s more insidious when you’re analyzing a larger space and you believe you’re accurately describing it (within some error) and you’re wrong.) Rationality within context is generally the best we can do. I suppose that’s a theory: given limited inputs, given priors that shape both interpretation and response, this is what you see and that always tends toward 0 except in terms of decision it always tends toward 1 because life requires decisions.

    I’m not sure I’m getting out what I’m trying to say and I so enjoy your posts that I’m going to keep going in hopes I can say it better. I’ve told my children that any advice I give is essentially of random value to them: I’m taking my anecdotal, highly noisy and low-powered in scope set of experiences as they have been shaped by my priors and as they’ve fed into my current model of behavior and advice giving and I’m applying that to their similarly anecdotal, noisy, etc. Not only is there simple noise on both ends – my experiences, their experiences are small, individualized, etc. data sets – but there are process errors radiating from me and my choice of words (and occasion – as one of my children hung up on me yesterday because I inadvertently hit the wrong note) and interpretation errors on their ends, as well as the basic issue that the model I’m specifying may not work anymore, that it no longer fits the general contextual model or no longer fits my childrens’ individual and joint models. I could keep going but it should be clear that advice sucks unless it doesn’t suck. I’ve told them advice is a lot like the envelope scam: send out 10k letters predicting x or y, so if x comes true then you send out letters to those predicting a new x or y, etc. until you’ve proven to the small remainder that you’re remarkably prescient! (I hope they remember what worked and credit me more than they credit me with what was useless or actually negatively signed.) But we have to issue advice. It’s part of the relationship. It’s part of life that we have to exchange information, make decisions, offer opinions, even though much of what we share may have negative value or no value whether now or in the long run. I would have said like birds twittering but I think Twitter not only uses that correctly but demonstrates the point about the relative uselessness and negative sign of so much that’s shared.

    Let’s take himmicanes. People study hurricanes and examine damage levels based on where storms originate, how they develop, etc. Someone notices that maybe to that person it looks like male storms are stronger. That leads to a thought that maybe there’s something in that: maybe people assign male names to storms that develop in these spots in the intuitive knowledge that these are more likely to become bigger or maybe they’re just fishing in the dark. But it’s just a thought and then it gets offered out and rightly ridiculed. To contrast, take a case where intuition seems to matter. I read a neat piece years ago about stealing 3rd base which found the very old “book” on it matched the statistical analysis or the “new book”. (This was pre the generalized use of shifting and some other factors.) That’s where theory would say intuition works: a history of trials conveys a message that is internalized as a set of basic rules about when you steal 3rd. A bunch of trials witnessed by enough people, done by enough people over a span of years so the bits of data harden into an idea that can be shared as a set of rules. That’s a theory too and it exposes why himmicanes is silly: since the names are from an alphabetical list put out before the season, then they’re speculating about non-human knowledge, as though the non-human universe knows that Hurricane Jim is called Jim and not Francine and that human label somehow matters to the non-human universe so it adjusts what would happen on a grand scale. It’s mind-bogglingly not science.

    So you can develop theories about what can be determined. Baseball is so great for that because the play is so constrained: stay within lines, in order of bases reached, so a number of trials can have specific outcomes. Most of life though is more spherical cow: the trials aren’t constrained and the confounding issues can be extremely complex and the outcomes are typically confounded too. As one of my kids would tell me, the relatively small studies of some cancer drugs could turn in significance based on how a fairly small number of cases were recorded for symptoms, for cause of death, etc. This obviously can feed into p-hacking and garden paths but it can also occur below view. I mean, to be clear, that at some level we have to take data as they are and we have to model that data according to processes that we choose at some level so at some level in both point and process you have to assume the messiness away.

    At this point, I throw up my hands because I think better tools, which you’re developing so impressively well, will continue to run into the limits of the real world’s need to resolve questions to an apparent 1, to a decision, even though the data (to the extent it even exists or is measurable) points toward 0. And I totally and absolutely believe that focusing on improving the taught and learned conception of what is appropriate for real analytical consideration is beneficial. We will always bleed past reason but of course we absolutely need to classify problems as ripe for this degree of consideration versus that degree and hope over time to spread the net of ripeness further.

    BTW, with regard to Wansink, he says that having a fruit bowl associates with 6 fewer pounds. Maybe. But I can say that having a fruit bowl clearly associates with fruit flies.

  6. AnonAnon says:

    Because I’m a fan of his, that Lewin quote is probably one of the more striking examples of selective quotation by academics. The fuller quote, which can be found in his collection of Field theory papers, reads:

    “Many psychologists working today in an applied field are keenly aware of the need for close cooperation between theoretical and applied psychology. This can be accomplished… if the theorist does not look toward applied problems with highbrow aversion or with a fear of social problems, and if the applied psychologist realizes that there is nothing so practical as a good theory.” Lewin, K., & Gold, M. E. (1999). The complete social scientist: A Kurt Lewin reader. American Psychological Association.

    Sometimes the lack of theoretical sophistication in applied work stems from a lack of communication between theoreticians and practitioners. (Not saying that problem applies here.)

  7. Jonathan (another one) says:

    Just to further distinguish myself from the other Jonathan, great post.

  8. Michael Lew says:

    This long post focussed on an important issue that does indeed deserve wider understanding. However, I’m disturbed that you didn’t mention the possibility that the noise mining might turn up real effects that are of interest. We need researchers to not only recognise the role of sound theory, but also to recognise the need for corroboration of new results prior to publication. It is easy to ‘find’ ‘interesting’ ‘significant’ things using Wansink’s approach, but the probability that any of them would be misleadingly corroborated by new data would be very low.

    The absence of theory is not the main problem in my opinion. We should be encouraging researchers to learn as much as they can from data as long as they test the new ideas with new experiment prior to treating them as tested. Wansink’s crime is to accept and publish every new thing that he was able to wring from the data without testing anything.

  9. Eric says:

    Andrew,

    It looks like you have in mind two kinds of theory: (1) a theory that says X causes Y in some way, and (2) a “measurement theory”, without which noise is likely to be high.

    If this is what you have in mind, could you explain the difference between these? I’ve never heard of the second kind, and I don’t know why the statement would be true or false.

  10. Terry says:

    Speaking of polls, it looks like the Presidential Job Approval polls have the same patterns as the pre-election polls: a huge dispersion, with Rasmussen the most pro-Trump, and Quinnipiac the most anti-Trump.

    Presumably, each poll carried over their pre-election methods and are getting the same biases.

    http://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html

Leave a Reply