Skip to content

Tell me what you don’t know

We’ll ask an expert, or even a student, to “tell me what you know” about some topic. But now I’m thinking it makes more sense to ask people to tell us what they don’t know.

Why? Consider your understanding of a particular topic to be divided into three parts:
1. What you know.
2. What you don’t know.
3. What you don’t know you don’t know.

If you ask someone about 1, you get some sense of the boundary between 1 and 2.

But if you ask someone about 2, you implicitly get a lot of 1, you get a sense of the boundary between 1 and 2, and you get a sense of the boundary between 2 and 3.

As my very rational friend Ginger says: More information is good.

Postdoc opportunity here, with us (Jennifer Hill, Marc Scott, and me)! On quantitative education research!!

Hop the Q-TRAIN: that is, the Quantitative Training Program, a postdoctoral research program supervised by Jennifer Hill, Marc Scott, and myself, and funded by the Institute for Education Sciences.

As many of you are aware, education research is both important and challenging. And, on the technical level, we’re working on problems in Bayesian inference, multilevel modeling, survey research, and causal inference.

There are various ways that you can contribute as a postdoc: You can have a PhD in psychometrics or education research, and this is your chance to go in depth with statistical inference and computation, or maybe you can do all sorts of Bayesian computation and you’d like to move into education research. We’re looking for top people to join our team.

If you’re interested, send me an email with a letter describing your qualifications and reason for applying, a C.V., and at least one article you’ve written, and have three letters of recommendation sent to me. All three of us (Jennifer, Marc, and I) will evaluate the applications.

We have openings for two 2-year postdocs. As per federal government regulations, candidates must be United States citizens or permanent residents.

“What then should we teach about hypothesis testing?”

Someone who wishes to remain anonymous writes in:

Last week, I was looking forward to a blog post titled “Why continue to teach and use hypothesis testing?” I presume that this scheduled post merely became preempted by more timely posts. But I am still interested in reading the exchange that will follow.

My feeling is that we might have strong reservations about the utility of NHST [null hypothesis significance testing], but realize that they aren’t going away anytime soon. So it is important for students to understand what information other folks are trying to convey when they report their p-values, even if we would like to encourage them to use other frameworks (e.g. a fully Bayesian decision theoretic approach) in their own decision making.

So I guess the next question is, what then should we teach about hypothesis testing? What proportion of the time in a one semester upper level course in Mathematical Statistics should be spent on the theory and how much should be spent on the nuance and warnings about misapplication of the theory? These are questions I’d be interested to hear opinions about from you and your thoughtful readership.

A related question I have is on the “garden of forking paths” or “researcher degrees of freedom”. In applied research, do you think that “tainted” p-values are the norm, and that editors, referees, and readers basically assume some level of impurity of reported p-values?

I wonder, because it seems, if applied statistics textbooks are any guide, that the first recommendation in a data analysis seems to often be: plot your data. And I suspect that many folks might do this *before* settling in on the model they are going to fit. e.g. If they see nonlinearity, they will then consider a transformation that they wouldn’t have considered before. So whether they make the transformation or not, they might have, thus affecting the interpretability of p-values and whatnot. Perhaps I am being an extremist. Pre-registration, replication studies, or simply splitting a data set into training and testing sets may solve this problem, of course.

So to tie these two questions together, shouldn’t our textbooks do a better job in this regard, perhaps in making clear a distinction between two types of statistical analysis: a data analysis, which is intended to elicit the questions and perhaps build a model, and a confirmatory analysis which is the “pure” estimation and prediction from a pre-registered model, from which a p-value might retain some of its true meaning?

My reply: I’ve been thinking about this a lot recently because Eric Loken, Ben Goodrich, and I have been designing an introductory statistics course, and we have to address these issues. One way I’ve been thinking about it is that statistical significance is more of a negative than a positive property:

Traditionally we say: If we find statistical significance, we’ve learned something, but if a comparison is not statistically significant, we can’t say much. (We can “reject” but not “accept” a hypothesis.)

But I’d like to flip it around and say: If we see something statistically significant (in a non-preregistered study), we can’t say much, because garden of forking paths. But if a comparison is not statistically significant, we’ve learned that the noise is too large to distinguish any signal, and that can be important.

What’s the point of the margin of error?

So . . . the scheduled debate on using margin of error with non-probability panels never happened. We got it started but there was some problem with the webinar software and nobody put the participants could hear anything.

The 5 minutes of conversation we did have was pretty good, though. I was impressed. The webinar was billed as a “debate” which didn’t make me happy—I wasn’t looking forward to hearing a bunch of pious nonsense about probability sampling and statistical theory—but the actual discussion was very reasonable.

The first thing that came up was, Are everyday practitioners in market research concerned about the margins of error for non-probability samples? The consensus among the market researchers on the panel was: No, users pretty much just take samples and margins of error as they are, without worrying about where the sample came from or how it was collected.

I pointed out that if you’re concerned about non-probability samples and if you don’t trust the margin of error for non-probability samples, then you shouldn’t trust the margin of error for any real sample from a human population, given the well-known problems of nonavailability and nonresponse. When the nonresponse rate is 91%, any sample is a convenience sample.

Sampling and adjustment

The larger point is that just about any survey requires two steps:
1. Sampling.
2. Adjustment.

There are extreme settings where either 1 or 2 alone is enough.

If you have a true probability sample from a perfect sampling frame, with 100% availability and 100% response, and if your sampling probabilities don’t vary much, and if your data are dense relative to the questions you’re asking, then you can get everything you need—your estimate and your margin of error—from the sample, with no adjustment needed.

From the other direction, if you have a model for the underlying data that you really believe, and if you have a sample with no selection problems, or if you have a selection model that you really believe (which I assume can happen in some physical settings, maybe something like sampling fish from a lake), then you can take your data and adjust, with no concerns about random sampling. Indeed, this is standard in non-sampling areas of statistics, where people just take data and run regressions and that’s it.

In general, though, it makes sense to be serious about both sampling and adjustment, to sample as close to randomly as you can, and to adjust as well as you can.

Remember: just about no sample of humans is really a probability sample or even close to a probability sample, and just about no regression model applied to humans is correct or even close to correct. So we have to worry about sampling, and we have to worry about adjustment. Sorry, Michael Link, but that’s just the way things are. No “grounding in theory” is going to save you.

What’s the point of the margin of error?

Where, then, does the margin of error come in? (Note to outsiders: to the best of my knowledge, “margin of error” is not a precisely-defined term, but I think it is usually taken to be 2 standard errors.)

What I said, during our abbreviated 5-minute panel discussion, is that, in practice, we often don’t need the margin of error at all. Anything worth doing is worth doing multiple times, and once you have multiple estimates from different samples, you can look at the variation between them to get an external measure of variation that is more relevant than an internal margin of error, in any case.

The margin of error is an approximate lower bound on the expected error of an estimate from a sample, and that such a lower bound can be useful, but that in most cases I’d get more out of the between-survey variation (which includes sampling error as well as variation over time, variation between sampling methods, and variation in nonsampling error).

Where the margin of error often is useful is in design, in deciding how large a sample size you want to estimate a quantity of interest to some desired precision.

In an email discussion afterward, John Bremer pointed out that in tracking studies you are interested particularly in measuring change, and in that case it might not be so easy to get an external measure of variance. Indeed, if you only measure something at time 1 and time 2, then the margin of error is indeed relevant to assessing the evidence. To get an external measure of uncertainty and variation you need a longer time series. I just wanted to emphasize the point that the margin of error is a lower bound and, as such, can be useful if it is interpreted in that way. Even if sampling is perfect probability sampling and there is 100% response, the margin of error is still an underestimate because the sample is only giving a snapshot, and attitudes change over time.

Patience and research

I’m going to follow up on a recent post of Thomas Basbøll and argue that patience is an important, and I think under-appreciated, practice in research.

This is an odd post for me to write because I’m usually not a patient person. In some ways, though, and surprising as it may sound, blogging is a good way for me to exercise my patience. I’m writing this on 2 Sept (at 3:55 in the afternoon, I’m still coming down from the high of teaching two classes so no way I can do real work, and it’s not 4pm so I can’t yet read my email) but I’ve scheduled it to appear on 18 Oct, at the current spot at the end of the queue. I love the feedback from commenters (in the old days, I also loved when I’d get reactions from other blogs, but we don’t see so much of that anymore), but I’ll patiently wait a month and a half for that.

Anyway, I don’t have anything deep to say here, just the commonplace notion that we typically have to try lots of things until we get some success. And apparent success is often illusory. (Obligatory link here to that 50 shades of gray paper.) The fractal nature of discovery.

Debate on using margin of error with non-probability panels

Tomorrow (Thurs 22 Jan) at 2pm, I’m participating (along with Jane Tang, John Bremer, Nancy Brigham, and Steve Mossup) on an online discussion, moderated by Annie Pettit, on the above topic.

Here’s the description:

Most marketing researchers know that using Margin of Error with convenience samples, non-probability samples, and online research panels is inappropriate. However, some researchers continue to report MOE as there does not seem to be a simple or any alternative.

Join Ipsos and a panel of experts for a webinar discussion about:

Why is it appropriate or inappropriate to use MOE with online research panels?

Is it appropriate to use MOE with other types of research, e.g., telephone surveys / RDD

Are there any appropriate alternatives that give similar guidance?

If there are no appropriate alternatives, what should researchers do to guide people interpreting their data?

How can researchers/pollsters who do not use MOE compete with pollsters who do use MOE, particularly when research users demand it?

How are research users supposed to know good from bad without seeing MOE or alternatives?

I can’t tell you how much I hate that first sentence in the above blurb. “Most marketing researchers know” should be replaced by “Many marketing researchers believe.”

Rose-ringed_Parakeet_eating_leaves

I don’t really know what I’ll have to say, beyond yapping out “91% nonresponse! 91% nonresponse!” like a demented version of Long John Silver’s parrot. Any of you who want all my content without hearing the discussion can read this post or this article.

Anyway, the panel will be 30-45 minutes long, and it seems that you can sign up here. Too bad they didn’t get Michael Link, president of AAPOR, to participate; then I could’ve asked him why he didn’t respond to my request for clarification.

P.S. Due to technical difficulties this event never happened. It got rescheduled to a time next week that I can’t make, but you can go hear the others, I suppose. I’ll post something tomorrow on what we did say during our brief panel discussion. And here it is.

High risk, low return

This one is just too good not to share. I came across it via a link from Retraction Watch.

Director of Paris journalism school suspended for plagiarism:
Executive director of journalism school at Sciences-Po university suspended while the university investigates accusations she was plagiarising other people’s articles for columns in the Huffington Post . . .

The website Arret Sur Images said it fed around 20 of her columns into an online plagiarism checker and found that in half of them at least one sentence, but more often two or three, had been lifted from other articles and presented unchanged and without attribution.

Hey, I taught at Sciences-Po! (But I didn’t know this person.)

What’s funny about this story, though, was that she plagiarized, and risked losing her career, to publish at . . . the Huffington Post!? That’s what I call high risk, low return.

This is as ridiculous as if a prominent statistician had destroyed his reputation by plagiarizing review articles in some obscure journal on, umm, I dunno, “Interdisciplinary Reviews”? Nah, that could never happen.

Plans for reboot of Statistical Communication class

At the end of my course on Statistical Communication and Graphics last semester, I enlisted some of the students to help plan for the new version of the course (which starts next week). I took a bunch of notes on the blackboard and then a student took pictures for me. I had the idea that I’d use these notes to plan the revised course. The discussion was helpful, and it was probably even helpful to write all this stuff on the board, but I didn’t really know what to do with the notes themselves . . . so I’m sharing them here below, just in case they amuse you. Perhaps a good message to send to all of you, that I don’t know what I’m doing either:

photo 1

photo 2

photo 3

Github cheat sheet

Mike Betancourt pointed us to this page. Maybe it will be useful to you too.

Another benefit of bloglag

In the classic Philip K. Dick novel, The World Jones Made, the main character has the ability to see the future, in particular he knows what will happen a year in the future, with this window moving forward relative to present time. Sounds cool, huh? But that’s not the character’s perception; instead:

It’s not so much like I [Jones] can see the future; it’s more that I’ve got one foot stuck in the past. I can’t shake it loose. I’m retarded; I’m reliving one year of my life forever.

But this post is more upbeat, it’s a return discussion of my practice of posting blog entries a month ahead of time. One thing that can be frustrating about lagged posting is that I have some great idea (for example, The paradoxical nature of anecdotal evidence, which I just posted on and so will appear in a month and a half from now, i.e. “yesterday” to you) but I don’t get the discussion for a month and a half.

But the plus side—and I think it outweighs the minus—is that I’m so overwhelmed, that if I posted every idea right when it came to me, and I got the feedback right away, I might easily forget the whole incident. By spreading things out over two months, I get another chance to think about the subject, to fit the piece into the larger puzzle.