I’m teaching two classes this semester:

– Design and Analysis of Sample Surveys (in the political science department, but the course has lots of statistics content);

– Statistical Communication and Graphics (in the statistics department, but last time I taught it, many of the students were from other fields).

I’ve taught both classes before. I taught Statistical Communication last semester. It went well and I’m rearranging it a bit for the spring. It should go well.

I’ve taught Design and Analysis of Sample Surveys twice before, and each time the students have wanted a bit more statistics and a bit less social science. Most of the students in the class are studying political science but they can get that from the other profs in their program; when they take my course they’re looking for the hard statistics stuff they can’t get anywhere else. Their favorite part of the course was when I taught them about practical regression modeling.

These exam questions should give you an idea of what was in my surveys class before. It’s ok but this time I’m going to go lighter on the traditional sampling topics (ratio and regression estimation, stratified cluster sampling bla bla bla) and instead have them do Mister P for real in R and Stan, just like the grownups do. These are Columbia grad students, for chrissake—I don’t know what I was thinking before. If they don’t learn serious survey analysis now, when will they?

Don’t get me wrong here. I won’t teach *only* MRP. But it will flow naturally from (a) regression modeling, and (b) the goal of using a sample to make inferences for the population. From this perspective, it would be perverse to teach regression and sample surveys and *not* show them how to do MRP. And, once they’re fitting multilevel models, it makes sense to do it in Stan, since that’s what everybody’s gonna be using soon anyway.

OK, so here’s the deal. In revamping my Design and Analysis of Sample Surveys, I need to fix two things:

1. The course material. Less of the boring classical stuff that I used to force myself to teach and force the students to remember (for example, the expression for the standard error of the ratio estimate) and more of the good stuff. To get more specific, I need to write some R and Stan code to do MRP in some simple examples, I need to get the relevant census data together, etc. And of course I need to put this in the context of 14 weeks of class.

2. The classroom experience. Me standing up and talking in front of a class of 25 students? What a joke. Anything important I can say, I can write instead, and the students can read (remember, they’re Columbia grad students: if they can read AJPS papers, they can read whatever tutorial material I write). Classroom time is mostly wasted unless it involves active student learning. I know this in the context of my other course, now it’s time to walk the walk and do it for all my other classes. Starting with this one.

**What to do during those 28 sessions, each 75 minutes long?**

But . . . what should I actually do in class? I’m not sure. The first week of class I can lecture and have discussion, that’s no problem, the students need to get a sense of what’s coming and why it’s important. I guess I should prepare a few work-in-pairs problems, though. Then, after that first week, their homework assignments will start to come in, and we can spend time on that.

I’ll require that students bring their laptops to every class so that, whenever we want, we can break them out and start working. More efficient to get their R and Stan issues resolved in 15 minutes during class than during tearful overnight sessions at home.

I still think I need a specific plan, though.

It goes like this: Each week we have topics, readings, homeworks, and the skills and concepts I want the students to learn. This all drives the class period. I’ll prepare some slides to spark discussion.

No fear of dead time. That’s important. The students have tons that they have to figure out, that they ultimately have to work out for themselves. Two 75-minute periods a week are not a lot of time, it’s precious time for me to help them out.

So, I still need to make a plan for how to spend each class, starting in week 2.

In the meantime, here’s my current schedule of topics for the 14 weeks of class. Any comments are appreciated.

Introduction (week 1):

1a: Overview of the course

1b: Examples of surveys in the newsStatistics review (weeks 2–4)

2a: Basic statistics

2b: Statistical inference in the context of large variation3a: Linear regression

3b: Logistic regression4a: Statistical graphics

4b: Causal inferenceClassical design and analysis of surveys (weeks 5–7)

5a: Survey interviewing

5b: Survey measurement6a: Simple and stratified random sampling

6b: Weighting and poststratification7a: Cluster sampling

7b: Analysis of data from cluster samplingSocial and political science (weeks 8–10)

8a: Surveys in the United States

8b: Surveys in other countries9a: Voting and political participation

9b: Public opinion10a: Network sampling

10b: Survey experimentsAdvanced analysis of survey data (weeks 11–14)

11a: Bayesian regression

11b: Multilevel modeling12a: Item-response and ideal-point modeling

12b: Multilevel regression and poststratification13a: Constructing survey weights

13b: Missing-data imputation14a: Open problems in analysis of survey data

14b: Summary of the course

Maybe we should do some role-playing activities? Maybe the students should design and conduct a survey together? I don’t know.

If I can repeat the wisdom of others:

Start with the assessment. What should each student be able to do at the end of the class? Work backwards from there.

Practice. If they can only try once, they will likely fail and that is that. As a result, if the final assessment

is something ‘big’ – it needs to be broken down and each piece needs to be something that can be attempted repeatedly.

These little pieces can be turned into problems that have ‘parameters’ – so that they can try and fail, try and

succeed, repeatedly – until they reliably succeed.

Anyway, this is my recitation of the lesson I was taught. It seems like a Class about Surveys, rather than a Survey Class,

is amenable to this kind of teaching and learning.

Ben:

Yes, I agree, and that’s a helpful formulation; thanks.

What I want the students to be able to do (in no particular order) is:

(a) Conduct a survey (at least in theory)

(b) Analyze data from a survey they design themselves

(c) Find and grab data from existing social surveys

(d) Analyze data from existing social surveys.

I then need to supply readings, hand-holding, and homeworks that will show them how to do these things and give them practice doing it.

And I need to figure out how to structure those 2100 valuable minutes of classroom time.

Andrew, I think in this case, you want to do exercises where you have the student prepare something in advance (I.e. a piece of analysis from an existing survey, or some questions for a potential survey) and present them to the class. In my grad school field classes, tasks like that were the most helpful because 1) I had enough time to prep something of reasonable quality; but 2) I couldn’t just phone it in because the whole class would be watching.

I think there’s a temptation to do in-class activities, but for anything that this class involves, I think you want students to be able to think about it at their own pace. By the way, I think is one place where group work is pretty effective, especially since you might have 20 students who can’t all present in one or even two classes.

What you have right there, in your list of things you want them to be able to do is your assessment, homework, path to hand-holding, etc.(eg. a) have them conduct a theoretical survey, etc. (simulate and describe realistic administration, limitations and such). And the thing is, very likely that list is the same as their list. And they’ll be sitting there wondering, why isn’t that what we’re doing? Let them generate data. Let them analyze messy data that seems really far outside the simple class example. The simple class example is a disservice to everyone. Give them a mess and let them clean it up.

One thing I haven’t tried but want to is bring the assignment directly into class but I think I need to make it much harder. It shouldn’t be too hard to separate what questions I will and won’t cover during class time. But I do bring allegories of the assignment into class. Those sessions is where I handle little pieces of the assignment in just a little bit of a sideways way so they can go home and say try to relate that problem to what they’re working on.

That way I transform the assignment from merely an assessment to a learning exercise. They have several opportunities to submit (it used to be unlimited but that hurt some of them more than it helped). Each submission is graded pass or ‘not yet’ This makes the process of assessment and the process of learning much more strongly intertwined. It forces them to communicate with me and gives them the opportunity to try things and experiment and know they’ll get feedback and advice on what to do to solve the problem.

Stepping back from that it might be easier to see what needs to happen in class. If you generate the assignment first then you can see the building blocks that need to be established and perform those in class.

John:

I appreciate the advice but there still seems to be a missing step. You write, “Let them generate data. Let them analyze messy data that seems really far outside the simple class example.” But the point is that they don’t yet know how to do this—this is what they want to learn in the class!

But maybe that’s what you’re suggesting? That I prepare a class period with an assignment, then they come to class having started it, and they work in pairs, with my help, to get it done?

They learn so much more if you have them do stuff in the room and you are there to help them trouble shoot. Then at key points that you want to stress, you get everyone to stop and pay attention to whatever problem they are having, not in a way that is embarrassing … but like “this is really an interesting but typical problem ..” and that’s where you give your 5 minute discussions on whatever it is.

That’s close. I was suggesting something a little different but it will work similarly sometimes.

I kind of have a parallel thing going. They get a messy assignment that’s relatively large that they work on for assessment at home and in class they get smaller tasks that parallel issues they’ll run into in the larger take home assignment. So they’re working on similar problems for assessment as they’re covering in class (at least they are if they follow my advice).

I kind of like your idea though, get them to start it at home and complete it in class. I suppose sometimes I do something a little similar. I give homework due before the next class and give them enough information that the ambitious could solve the whole thing (but many likely won’t). This gives me a bunch of submissions that I can go through and find the more common errors. In class we do either the same or a very similar task again. I was surprised to find that the ones who did solve it before weren’t bored. Perhaps they like the affirmation of their brilliance or maybe the additional insights they probably missed doing it on their own.

I think that it helps if students try to solve a specific problem, especially if the students try to solve the problem first on their own and they realize that they need to learn more, or if you point out errors or potential issues with their solution and then they realize that the need to learn more.

In terms of what to spend class time on, I think that valuable use of class time includes guided practice and teacher-student interaction, because those things have a larger loss of quality when moved outside the classroom, compared to, say, lecture, unguided practice, and independent group work.

—

Maybe have students download data and the questionnaire from a TESS study, and have students present the results as best that the students can. Then you can correct errors and offer thoughts about how best to analyze the data and present the results. If students don’t know much data analysis at the start of the course, you can always give them the relevant data in Excel and have them calculate means and standard deviations and graph the data to see patterns. Then show them with different data how to calculate summary statistics in a more powerful program, with the assignment to calculate the same summary statistics for their data. Then do the same thing for graphing, inferential statistics, data cleaning, weighting, and missing data imputation.

If students have their own problem to solve, a lot of different questions might come up about specific issues that you might not have thought about teaching. Students would need to use some of the general methods you are teaching but might also need to figure out some things on their own for their particular problem.

—

Maybe you can use one of the TESS studies as a general example. Here is a TESS study on racial bias that lends itself to multiple research designs: http://www.tessexperiments.org/data/eberhardt033.html.

Data analysis for that particular study involves a lot of decisions that matter in terms of inferences, which illustrates that some of learning how to do surveys is of the “how do I do this?” variety and some of learning how to do surveys is of the “should I do this?” variety.

Some questions raised in the data for that TESS study include: Should all respondents be analyzed together, or should white respondents be analyzed separately? Are there enough black respondents to make inferences about blacks, either in the sample or in the general population? If so, can the data for black respondents be weighted, or are there too few responses from black respondents to even try to make inferences based on weighted data? Which cases should be excluded because, say, the respondents took too little time or too much time to complete the survey? Should all three manipulation checks be used to exclude respondents? If a lot of responses are discarded due to incorrect responses to the manipulation checks, should the weights be recalculated? Results were not statistically significant for some of the dependent variable items: should results for those items be reported? Should all of the dependent variable items be used to make a single scale, or should the four items be analyzed separately? If so, does there need to be a correction for multiple comparisons? There is a lot of previous research on racial bias in jurors and in mock jurors: should that data be incorporated in some way?

Here’s the actual report for the study: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0036680.

—

You can also teach good habits. That TESS study — and its myriad possibilities — is a great example of why preregistration is important, to eliminate (much of) the need to worry about all those forking paths. If your students design a survey and then implement the survey, maybe assign them take the intermediate step of preregistering the research plan, at least in the sense that students write out an introduction, a bit of theory, and a detailed description of the planned research method and analysis. Maybe have students conduct a peer review of other students’ preregistered research plans, to suggest improvements and comment on anything missing from the design, such as a rule for when cases should be excluded; then you can grade the quality of the peer reviews and address disagreements about a particular research design or analysis choice.

—

The schedule seems logical, but it appears that weeks 10 to 13 concern the content that is most difficult and most amenable to practice, so placing that content at the end of the course limits practice time and the time students have to learn the most difficult content. Placing weeks 7 and 8 at the end of the course but before week 14 would mess up the logic of the schedule, but would provide students more practice and learning time for some of the more challenging content.

Would love to see your syllabus once you’ve finalized everything. Do you ever post them?

A couple of suggestions:

1. Regarding readings: Accompany reading assignments with some questions and/or pointers about the reading. This serves several purposes: a) To do this well, you need to think about why you have chosen the reading. b) It helps students avoid getting sidetracked into less important parts of the reading. c) It helps give a little structure to the class, since you can start class by asking the questions for students to discuss.

I’m not thinking about “factual” questions, but things like, “Does the body of the article substantiate what is stated in the abstract?”, “What sections could be clearer?”, “Why do you think I assigned this reading?” “What do you think I liked most about this reading? What do you think I liked least?” “What was most interesting (made you think/most important/…) to you in this reading?” — I’m sure you can add to the list.

2. Since this is a course on surveys, be sure to spend some class time having students critique survey questions — thinking about how clear they are, how they might be misinterpreted, whether they really address the purpose of the survey, etc. Maybe start by critiquing existing surveys, then have students propose questions and critique them. Maybe precede with a reading on bloopers in survey questions.

From that list of what you want them to be able to do, I would break up the class into sections where you do it.

Now, I teach undergraduates who are pretty stats phobic, and I can tell you what I have done to make the whole thing more active. You can obviously do so much more … after some background on surveys and doing some reading, I have them write some questions at home (actually 10 questions plus one existing measure). Then, they bring them in and we spend a class pretesting the questions on each other and iterating on revisions. With grad student you could make it a lot more rich, maybe try to create and validate a scale or something.

Now also, I use the SDA site at Berkeley a lot mainly with the GSS. I wouldn’t do this with my students but I would with grad students: make them pick some GSS variables and read the technical reports on them and report to the group. I usually give groups of students a GSS variable to work with over about 4-5 weeks and we do a lot of different things with it (like look at whether race or gender of interviewer makes a difference–there are other cool variables too like whether an incentive was paid, interviewer reported cooperation and comprenhension, etc also even the vocab test would be cool to look at though I wouldn’t do it with my students).

The other thing with the GSS is to understand how the cluster sampling works and how that leads to interesting consequences in some waves, like when they get a cluster in Utah.

Now, again with my students, I use SAMP to look at different sampling methods, we collect a lot of data for each of the 4 types in the software and then look at costs and benefits. With grad students I might do something more complex if I had time, like use some PUMS data for a population and then sample with different strategies and different response models.

PS I keep thinking I’m going to make a shiny app to replace SAMP with real data, but I always run out of time before the semester starts.

What I always felt missing was a gold standard. Something to compare the results of the survey against that’d tell me how good or bad I did.

Like in a Physics class you could estimate the weight of a tuning fork or something indirectly but then be able to know how right or wrong you were. Or estimate an unknown concentration in a chemistry titration or identify an unknown salt.

That sort of stuff seems harder for a survey but I’d still like something of that nature if possible. Otherwise it becomes too much of a debate & about the instructors personal idiosyncrasies.

Maybe one could actually assign survey tasks that later one would know the “true” population value to, or at least a much better powered estimate than the in class assignment.

Rahul:

1. We do compare pre-election polls to election outcomes, so there’s that. Also there are tuning-fork-like exercises where we sample from a known population (for example, taking an existing large survey and considering it as the population, then sampling with various simulated versions of nonresponse and trying to estimate the true “population” values, which can then be checked. Maybe I’ll try one of these exercises in the course.

2. It’s not about the instructor’s personal idiosyncrasies. These are pretty much accepted methods (at least in political science); the challenge is that they’re pretty technical and students need to learn some statistics and some programming in order to use them effectively.

Random thought: What about choosing some local election or referendum (school board, municipal, sheriff, student union etc.) scheduled to be held approximately at the end of the Semester & having students actually run a survey to estimate those outcomes?

That way they get to hone their methods etc. but they get validation about how well their survey / analysis / adjustments worked at end of Semester.

A very interesting post as I am currently planning my 2015 Sampling Methods course and thinking about out how to get more student involvement. I would be interested in your thoughts on what worked, what needs work, and what didn’t, after the course is completed.

To explain where some of the rest of my comment comes from, I am not an academic, but work in a Non-US National Statistical office. I am contracted by our local university to teach this course. It is in the maths and stats department, so I am expected to have to have a reasonable number of formulae, though only about 2/3 of students are stats majors. Not that I am against formulae because as I explain to the students a formula compresses a lot of information into a small space if you are comfortable with them. I have 24 50 minute lectures and 12 computer labs. So it isn’t quite the same as what you have.

Since surveying is mainly a hands-on activity it would seem fairly straight-forward to get something student focussed. My ideal would be to approach some non-profit what would like a survey to answer some questions and build it in during the semester. Time practicalities are the main hassle. Students are doing several other courses and already complain to me about the total work load. My colleagues at Bureau of the Census and ABS used to – and may still do – train their new graduate employees in survey design by giving them a real one-off survey to plan, design, run and produce outputs from. However it was a considerable workload, several person-weeks per student I seem to recall. Also I found I would have to deal with the ethics committee at uni which is a very complex, involved and takes time to get approval, though it crossed my mind that having students dealing with that would be a good learning experience in bureaucracy with a purpose 101.

Instead I get the students to collect non-personal data in teams of 2-3, which I accept doesn’t cover many of the problems of getting data from people. However I don’t tell them what data to collect but to start by supplying me a question they want answered (e.g. do girls get served faster at a bar, are SUV drivers more likely to run red lights). When I approve the question – this is mainly to check we aren’t going to require ethics approval – I get them to work out how to collect some data that might indicate the answer to their question. That creates a lot of discussion between them and between them and me. I have to approve the generalities of the data collection (again, to protect them from ethics problems), but also I get them to think about how good are their inferences about the population from their sample (actually is usually the other way around). Sometimes they have to get creative (e.g. here in NZ being seen to be collecting prices data in a supermarket can get you escorted out; how can you inexpensively observe for an hour in a cafe). An unexpected lesson for many groups is learning how to work as a group. Also many haven’t written a report since high school.

In class I do find that if you can ask the right question you can get some heated and informative discussions amongst the students, though it is generally from week 3 where this happens. Perhaps that’s just NZ students?

An slightly off-the topic question. I have been asked a couple of times recently by students as to “why they would need this when there is big data?” I have talked about the data collection aspects of the course and how you need to understand the theory behind sampling along with non-sample errors to protect your self against some of the big data problems, but would be interested in any other thoughts from people.

In case you wonder how we do internally in StatsNZ it’s more an apprentice approach. That is, assign them to surveys and explain things as we go along. It works but I doubt it is efficient.