We can break up any statistical problem into three steps:

1. Design and data collection.

2. Data analysis.

3. Decision making.

It’s well known that step 1 typically requires some thought of steps 2 and 3: It is only when you have a sense of what you will do with your data, that you can make decisions about where, when, and how accurately to take your measurements. In a survey, the plans for future data analysis influence which background variables to measure in the sample, whether to stratify or cluster; in an experiment, what pre-treatment measurements to take, whether to use blocking or multilevel treatment assignment; and so on.

The relevance for step 3 to step 2 is perhaps not so well understood. It came up in a recent thread following a comment by Nick Menzies. In many statistics textbooks (including my own), the steps of data analysis and decision making are kept separate: we first discuss how to analyze the data, with the general goal being the production of some (probabilistic) inferences that can be piped into any decision analysis.

But your decision plans may very well influence your analysis. Here are two ways this can happen:

– Precision. If you know ahead of time you only need to estimate a parameter to within an uncertainty of 0.1 (on some scale), say, and you have a simple analysis method that will give you this precision, you can just go simple and stop. This sort of thing occurs all the time.

– Relevance. If you know that a particular variable is relevant to your decision making, you should not sweep it aside, even if it is not statistically significant (or, to put it Bayesianly, even if you cannot express much certainty in the sign of its coefficient). For example, the problem that motivated our meta-analysis of effects of survey incentives was a decision of whether to give incentives to respondents in a survey we were conducting, the dollar value of any such incentive, and whether to give the incentive before or after the survey interview. It was important to keep all these variables in the model, even if their coefficients were not statistically significant, because the whole purpose of our study was to estimate these parameters. This is not to say that on should use simple least squares: another impact of the anticipated decision analysis is to suggest parts of the analysis where regularization and prior information will be particularly crucial.

Conversely, a variable that is not relevant to decisions could be excluded from the analysis (possibly for reasons of cost, convenience, or stability), in which case you’d interpret inferences as implicitly averaging over some distribution of that variable.

I think there’s been too much focus on #2 (Data Analysis) and too little on #3 (Decision making).

In fact, I’d almost go as far as saying that most undergrad curricula totally ignore #3.

Even in books that do devote attention to both topics, the amount of coverage devoted to data analysis totally swamps the attention given to Decision Making.

“In fact, I’d almost go as far as saying that most undergrad curricula totally ignore #3. “

I’d say that they don’t totally ignore it — they just (unscientifically) reduce it to looking at whether or not a p-value is “statistically significant”– in other words, take the thinking out of it.

Agreed, Rahul and Martha! I just submitted a journal article proposal a few minutes ago, in which I wrote that:

“Statistics are no substitute for scientific thought; advanced statistical analyses of a poorly-designed experiment will still give you little useful information about the subject. What is true of computer programs is true of any mathematical procedure: garbage in, garbage out.”

Instructors need to make this a bigger point of emphasis! It’s not only on textbook writers; it’s also on instructors to emphasize important points that a textbook fails to point out. Arguably, an instructor’s words and practices have a bigger impact on student learning than a paragraph (or even a whole chapter!) in a textbook.

Z:

You write, “Arguably, an instructor’s words and practices have a bigger impact on student learning than a paragraph (or even a whole chapter!) in a textbook.” I’ve found the opposite: when I teach from a textbook not written by me, sometimes it seems that students just follow the book and it doesn’t matter what I say!

” I’ve found the opposite: when I teach from a textbook not written by me, sometimes it seems that students just follow the book and it doesn’t matter what I say!”

Whether or not this happens is likely to be a function of the assignments you give and how you grade them. If all your assignments come from the book, … well …

Martha:

No, it wasn’t that. The assignments did not come from the book. It’s just that the book is all they had. Now I’m more careful with their readings.

>>>If you know that a particular variable is relevant to your decision making, you should not sweep it aside, even if it is not statistically significant (or, to put it Bayesianly, even if you cannot express much certainty in the sign of its coefficient).<<<

Doesn't that beg the question: How do you *know* it is relevant in the first place?

I mean, oftentimes, isn't that the very goal of the analysis? i.e. To find out what variable is relevant and what's not?

I believe Andrew means relevant to the decision (regardless of effect size), rather than relevant to change in outcomes.

For example, in a clinical trial, the treatment effect is relevant to the decision to bring the drug to market regardless of effect size. In fact, if the effect size is very near zero, that’s extremely relevant to the decision to bring it to market!

Rahul, I think what you are discussing is something like EDA; we want to know what the important factors are. I think historically, statisticians have failed to give EDA the full attention it deserves, and the ML community has come and scooped this up with aggression.

I read recently, it may have even been in a comment here, that there is a big difference between making a decision and understanding.. The decision is about initiating a behavior. Understanding may form part of the basis for a decision, but it’s not itself a decision.

So maybe #3 is too restrictive?

Why not use Royall’s three questions:

1. What do the data say?

2. What should I believe now that I have the data?

3. What should I do or decide not that I have the data?

Thos questions provide a very nice scaffold for statistical concepts.

Scaffolds can be used well or poorly; the devil in in the details. One concern I have with your (Royall’s) list is that they all sound too certain. I’d prefer something like,

1. What do the data say? What do they not say? What do they still leave as unclear?

2. How do the data reasonably change or support my prior beliefs?

3. How do the data reasonably influence what I do or decide?

Martha, sure. I’m not wedded to the exact wording and, I admit, it is not exactly Royall’s wording.

The important thing about the questions is that they are three important _types_ of question. Many of the misconceptions in statistics come from misguided attempts to answer a question type with tools that are appropriate only for another type. For example, many of the problems with P-values come from an assumption that P-values are answers to all three questions. In fact, P-values are an answer only to the first.

Your additional parts to question 1 are interesting and important, but they cannot be answered by a statistical model and so they are distinct from what I intended by “What do the data say?”

Here’s something about this from Harrell’s blog that I didn’t quite understand:

“A frequent argument from data users, e.g., physicians, is that ultimately they need to make a binary decision, so binary classification is needed. This is simply not true….. At any rate, if binary classification is needed, it must be done at the point of care when all utilities are known, not in a data analysis.”Is he, in a way, saying that a study must restrict itself to #2 (data analysis) and leave #3 (decision making) to the practitioner in the field?

I think he’s saying don’t start with binary classification — if something can be modeled as a continuous variable, do so. Then, after data analysis, if a binary decision is needed, take all available information into account to decide on rational cut-offs to make the binary decision.

One advantage of this might be that what initially seems like it might require a binary decision (e.g., use this drug or not) may turn out to have more than two possibilities. (e.g., “treat with the drug under these circumstances and not under these; or “use no treatment in circumstances A; the drug in circumstances B; an alternate treatment in circumstance C; and both the drug and the alternate treatment in circumstance D). In other words, using a continuous variable and making cutoffs for treatment after analysis as a continuous variable leaves open more options than starting with a binary variable derived from the continuous variable.

Don’t start with binary is fine. No arguments.

But the part that’s interesting is that he seems to be advocating pushing the actual decision problem to the field and the practitioner. He says: “must be done at the point of care”.

Shouldn’t the researcher provide a model that takes attributes as inputs and spits out a binary decision?

As I pointed out before, the results may suggest a non-binary decision. Also, there is a tendency in medicine to put some reliance on “physician’s best judgment and patient preferences”. Sometimes this is not a good idea; sometimes it is.

“Shouldn’t the researcher provide a model that takes attributes as inputs and spits out a binary decision?” NO! Never. Remember that the context for this discussion is the decision process of a physician. There are too many real-world considerations that cannot be part of the statistical model to contemplate allowing statistics to substitute for thoughtful consideration by an intelligent agent.

How do you ensure that the reference set implied by your statistical model is the most relevant to the individual patient? How, for example, would you incorporate the patient’s objectives and desires into the statistical model that is run far from the point of care?

But from the looks of it, the physicians *themselves* are the biggest advocates for demanding a model with a binary output?

I doubt that all physicians fall in the same camp – some would prefer a binary output, some would not.

I don’t think it is a researcher’s role to be making point of care decisions, nor is it the point of all research to produce such models (just like Harrell pointe out that classification and prediction are different things). Not everyone does applied research, as an obvious point. But tons of clinical trials are just about safety or efficacy not about dosage. Yes statistical judgement is generally better than clinical judgement, but it might also not factor in everything relevant about a patient including the ethical obligation to take the patient’s views on risks and benefits into account and the requirement to explain those accurately and understandably.

+1

I get this all time (even from statisticians) as an argument in favour of significance testing. So that may be the motivation from Frank Harrell’s comment (haven’t read the blog post yet). Argument is that because a binary decision (treat/don’t treat) needs eventualy to be made, it is appropriate for the results of a clinical trial to be dichotomised into positive/negative (by using a significace test). It’s such a bad argument – for reasons that are so obvious that I don’t think that those who say it actually believe it.

So I guess what Frank is saying is that data analysis shouldn’t pre-empt decision making, or restrict potential decisions by crappy analysis methods.

I think critiques often confuse the method with the goal.

Significance testing may be crappy, but that’s just the approach that’s flawed. That doesn’t change the fact that at the far end of the chain there’s still a tricky binary decision to be made somewhere.

You can push the decision down the modelling workflow but you still must establish the model that makes a dichotomous decision at the end.

Let’s replace NHST by a Bayesian model, sure, but at the very end the model must still spit out a Yes or No (at least for a certain class of problems).

Yes, you need to make a binary decision eventually, but you don’t do yourself any favours by trying to make the decision on only some of the relevant information (especially if you use a crap method to generate the binary decision).

Just a note that one of the best books on statistical decision-making remains Keeney and Raiffa’s 1993, “Decisions with Multiple Objectives: Preferences and Value Tradeoffs.”

Thomas:

I actually found that book to be irritating: it was full of fake examples, and the whole attitude seemed kinda smug to me. I can’t remember exactly why I had that feeling, as it was about 20 years ago that I read the book (it was when preparing my course on decision analysis).

Hi Andrew, which book do you recommend on statistical decision making? I have a degree in statistics and, embarrassingly, know nothing about decision theory.

Rob:

In all seriousness, I’d start with the decision analysis chapter in BDA.