Syllabus for my course on Communicating Data and Statistics

Posted on October 2, 2015 9:30 AM by Andrew

Actually the course is called Statistical Communication and Graphics, but I was griping about how few students were taking the class, and someone suggested the title Communicating Data and Statistics as being a bit more appealing. So I’ll go with that for now.

I love love love this class and everything that’s come from it (including statistics diaries and ShinyStan).
Here’s the syllabus [updated]. It’s full of fun reading and great activities, in and outside of class. The only thing missing are the jitts, but I like to keep them as a surprise. So if you want to teach this class—and I think you should, indeed I think this course should be taught everywhere and it should be a standard part of the statistics and quantitative social science curriculum—you’ll just have to write your own jitts. Otherwise the course pretty much teaches itself. And remember, with your guest visitors, keep the converstations short and focus. Long rambling discussions are fun, and they’re easy on the instructor, but ultimately you want to spend lots of class time directly on feedback on student work.

Now for the next 90 seconds I’d like you to talk with your neighbor and come up with a question to ask me.

OK, start yapping!

41 thoughts on “Syllabus for my course on Communicating Data and Statistics”

Sean on October 2, 2015 11:05 AM at 11:05 am said:

Can you comment on how or how often you grade the diary entries?

Reply ↓
Vimal on October 2, 2015 11:19 AM at 11:19 am said:

I was just checking some links on the syllabus and this seems broken: http://statmodeling.stat.columbia.edu/2014/12/01/read-quantitative-social-science-implication-write. I think the date should read 2014/12/02 in the syllabus pdf.

Reply ↓
- Andrew on October 2, 2015 9:58 PM at 9:58 pm said:
  
  Thanks!
  
  Reply ↓
jimmy on October 2, 2015 11:45 AM at 11:45 am said:

“but I was griping about how few students were taking the class.” i do not see any difference in the two above names for attracting students. you do not have enough sexy words in the class name and description. you need to include words such as bayesian, data science, machine learning, BIG DATA, and such. and for the graphics part, maybe call it infovis. something such as bayesian information visualization for the data sciences.

Reply ↓
- Daniel Lakeland on October 2, 2015 12:04 PM at 12:04 pm said:
  
  Big Bayesian Data Freako-InfoVisomics
  
  Reply ↓
  - Elin on October 2, 2015 12:19 PM at 12:19 pm said:
    
    +1
    
    Reply ↓
Bruce McCullough on October 2, 2015 11:47 AM at 11:47 am said:

Cleveland is a superb book, and a landmark in statistical graphics. But it’s more about using graphics to discover than to communicate. I highly recommend “Show Me The Numbers, 2e” by Stephen Few. It’s all about how to structure tables and graphs to communicate something that you’ve already discovered. The subtitle for the book is “Designing Tables and Graphs to Enlighten”. It’s also very reasonably priced, at $27.50 for a hardcopy with lots of color printing. http://www.amazon.com/Show-Me-Numbers-Designing-Enlighten/dp/0970601972/ref=dp_ob_title_bk

Reply ↓
- Steen on October 2, 2015 1:25 PM at 1:25 pm said:
  
  I quite like the Stephen Few book as well, though I do not like his new book ‘Signal’ quite as much. I also like Doumont’s ‘Trees, maps, and theorems’—thanks to Dr. Betancourt for the recommendation.
  
  Cleveland’s other graphics book is a bit more general.
  
  I find Cleveland’s final conclusion about the barley data a bit odd.
  – Interaction plot better than dotplot: http://www.theusrus.de/blog/the-good-the-bad-52005/
  – http://www.tandfonline.com/doi/full/10.1080/00031305.2013.801783
  
  I hope you encourage the students to use colorblind-friendly colors in the ‘graphics’ section.
  
  Reply ↓
Tom on October 2, 2015 2:37 PM at 2:37 pm said:

Have you thought about running this sort of thing as a summer school for business / industry? I would bet you’d get a lot of takers (run over a week for example) and it is something that I think would be really useful and important for a lot of people in these areas.

Reply ↓
Mayo on October 2, 2015 9:54 PM at 9:54 pm said:

Andrew: I suppose this might be covered under some of the subheadings, but in terms of advertising a class on communicating statistics, I would have thought to include communicating concepts (significance levels, power, confidence levels, Bayesian concepts and distributions, correlation, replication, resampling, etc.). But maybe this is a different kind of class.

Reply ↓
- Andrew on October 2, 2015 9:56 PM at 9:56 pm said:
  
  Mayo:
  
  Yes, that’s covered in weeks 5 and 6 of the course.
  
  Reply ↓
Lefteris Anastasopoulos on October 2, 2015 11:21 PM at 11:21 pm said:

Good stuff. Probably one of THE most important courses for people that use and analyze data which is rarely/never taught. Would love to see some visuals from the final projects. Our course at the UCB Information School is called Data Visualization and Communication which has nice ring to it: https://datascience.berkeley.edu/academics/curriculum/data-visualization/.

“Statistical Communication” actually sounds more like “a statistical theory of communication” rather than “a way to communicate with statistics.” Definitely prefer the new name.

Reply ↓
Rahul on October 2, 2015 11:44 PM at 11:44 pm said:

Here’s a rather general question about effective communication of data & statistics: All these books and courses, what they sell as the “effective” way or the recommended strategy seems guided by what the author personally thinks as the best way.

But are there any books based on *empirical* measurements of what is actually effective in communicating statistics? I always found it odd, that for a field that relies on data the niche of statistical communication seemed almost entirely based on ideological grounds rather than any evidence of what actually works best.

Is there any movement on putting effective statistical communication on an evidence based framework?

Reply ↓
- Andrew on October 3, 2015 12:58 AM at 12:58 am said:
  
  Rahul:
  
  There have been some attempts to measure statistical communication (and we discuss this a bit in our class) but it’s difficult. Regarding your other point, I would say we make many of our decisions and recommendations based on our experiences and introspection, not so much based on ideology (Rep. Chaffetz aside).
  
  Reply ↓
  - Rahul on October 3, 2015 1:34 AM at 1:34 am said:
    
    Andrew:
    
    I didn’t mean political ideology. Methodological or academic ideology.
    
    When a doctor or public policy maker takes decisions based on introspection alone, we try to nudge him towards RCTs & meta analyses. Evidenced based Medicine & all that. No harm in injecting a bit of that empiricism in Statistical Communication?
    
    My personal opinion is that a lot of what passes as accepted wisdom in stat. comm. will prove to be just plain prejudice when put to rigorous test.
    
    Reply ↓
    - Andrew on October 3, 2015 1:38 AM at 1:38 am said:
      
      Rahul:
      
      Introspection isn’t perfect but it is a source of data. I do not think it is the same as ideology, academic or otherwise. I think the analogy goes like this: introspection is a source of detailed but uncontrolled data. Ideology is a crude sort of model of the world. We need both data and models, and we should try to get the best possible data and the most reasonable models. I’m doing the best I can in both of these and I welcome the work of others. Communication is harder to evaluate than medicine because the outcomes are not so clear. I’ve heard that newspapers evaluate articles based on hit counts but that’s not really what we’re looking for here.
    - Rahul on October 3, 2015 2:18 AM at 2:18 am said:
      
      Andrew:
      
      I disagree that communication is fundamentally harder to evaluate than medicine. People just don’t try as much. Maybe there isn’t enough money in it?
      
      The term “communication” is too broad to measure. I’m sure smaller parts of it can be usefully defined and measured.
      
      When researchers routinely try to measure such vague things like “happiness” and “satisfaction”, communication can hardly be an exception?
    - Andrew on October 3, 2015 2:37 AM at 2:37 am said:
      
      Rahul:
      
      Happiness and satisfaction are hard to measure too, and the work in that area is controversial, as it should be. I think statistical communication is indeed fundamentally harder to evaluate than life or death, T-cell counts, heart rate, time on the treadmill, bone density, etc.
    - Mayo on October 3, 2015 11:56 AM at 11:56 am said:
      
      Andrew: I think the deeper problem with evaluating successful communication in statistics turns on the fact that controversy surrounds some/many of the concepts and methods. The field wouldn’t be having that ASA pow wow of statisticians––just as one example–– in order to give guidance about the nature and value of statistical significance tests and related methods, if this were not the case. Even if some officially sanctioned definitions or recommendations emerge, I can well imagine different statisticians scoring students differently, if we were to envision a test of effectiveness of communication. I don’t think this should be so, nor do I think it must be so, but it seems likely in the current climate.
    - Tova Perlmutter on October 3, 2015 11:27 AM at 11:27 am said:
      
      And I can’t be the only person who has found that thorough, rigorous introspection has led me to question, modify or even reject a model I had been quite attached to.
    - Martha on October 3, 2015 5:40 PM at 5:40 pm said:
      
      +1
Dale on October 3, 2015 8:22 AM at 8:22 am said:

The only area where I know of attempts to measure something like the effectiveness of displays is in the educational area where effectiveness may be measured by some type of test performance. The reason why effectiveness is rarely measured (if at all) elsewhere is that effectiveness means different things to different people. I supposed a case can be made that Rep. Chaffetz may have very effective displays – it is just not what I think “should” be effective. Unfortunately, the most effective displays are often the most misleading ones – unless you have some type of more objective measure, such as ability to discern what the data is really saying. The money is often better in deception than telling the truth.

Reply ↓
- Rahul on October 3, 2015 8:30 AM at 8:30 am said:
  
  That’s a great point.
  
  If I had to design an experiment it would probably be a SAT / GRE style test where we show a “good” and “bad” version of a graph to a randomized cohort and then try and measure their performance on a subsequent set of graph related questions. You could add a test time constraint or a lag period between showing them the graphs and the testing.
  
  Reply ↓
  - Andrew on October 3, 2015 9:36 AM at 9:36 am said:
    
    Rahul:
    
    This sort of experiment has been done, but the challenge is in coming up with (a) reasonable comparison graphs and (b) reasonable test questions. It’s not that it can’t be done, but it’s not so easy to do well. See here, for example, for a discussion from five years ago of one such study that won an award, as I recall, but happened to be pretty much useless, in my judgment.
    
    Reply ↓
    - Lefteris Anastasopoulos on October 3, 2015 11:36 AM at 11:36 am said:
      
      Maybe a survey experiment on MTurk (if it hasn’t been done already) would be a good way to tackle this question. The same data presented in different ways can be treatments and you can create questions which tap into “effectiveness” such as comprehensibility as measured by the amount of time it takes respondents to answer a question about the data presented, aesthetics etc. Collect a couple thousand responses and you’ll at least have a better sense about what does and doesn’t work.
    - Andrew on October 3, 2015 12:18 PM at 12:18 pm said:
      
      Lefteris:
      
      Yes, there are lots of possibilities. I think these evaluations are difficult and I haven’t been impressed with some of what I’ve seen, but I definitely think it’s worth working on.
    - Jerzy on October 3, 2015 9:33 PM at 9:33 pm said:
      
      Indeed, people are doing work like this already. See for example Heike Hofmann and Dianne Cook’s studies of plot designs using Mechanical Turk: http://www.cs.tufts.edu/comp/250VIS/papers/Hofmann-graphicaltest-infovis2012.pdf
      
      There’s a long tradition of using experiments to evaluate graphs empirically. It dates back at least to this classic from 1984 by Cleveland and McGill: https://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/cleveland.pdf
      
      There are also ethnographic studies: how are statistics and graphics used in practice, “in the wild”? Here’s a talk by Amy Griffin, reporting such work in progress: http://www.ncrn.info/event/ncrn-virtual-seminar-feb-4-2015
    - Rahul on October 3, 2015 10:43 PM at 10:43 pm said:
      
      Very interesting.
      
      It’d be interesting to take someone like Tufte’s book & systematically evaluate the key recommendations, one by one.
      
      Perhaps too ambitious. But I think there’s hardly any money in this area. Lots of low hanging fruit.
    - Jerzy on October 4, 2015 11:03 PM at 11:03 pm said:
      
      If you’re curious to read research-backed recommendations (or the research they cite), they are the core of Cleveland’s books Elements of Graphing Data and Visualizing Data; Kosslyn’s Graph Design for the Eye and Mind; and Ware’s Information Visualization.
      
      With the bibliographies in those four books, I bet you could compile a list of evaluations for (nearly) every one of Tufte’s recommendations.
    - Jerzy on October 4, 2015 11:10 PM at 11:10 pm said:
      
      (Not that those researchers set out to evaluate Tufte as such. But his advice is common enough, and not all original to him, that other people have studied the same ideas empirically.)
      
      I’d be curious to see such a point-by-point summary of the research on Tufte’s principles. As far as I can tell, he argues from authority or common sense, not from experimental research.
    - Andrew on October 4, 2015 11:29 PM at 11:29 pm said:
      
      Jerzy:
      
      To loop back to the subject of this post, Cleveland’s “Elements of Graphing Data” is one of the required books for this class.
      
      Kosslyn’s book I was less thrilled with, as I thought some of the actual graphs in his book demonstrate bad practice. I pretty much agree with Kosslyn’s perspective but I couldn’t bring myself to assign a book on graphics that had this sort of problem.
      
      Not all of Cleveland’s graphs are beautiful but they’re all pretty clean, which I like.
    - Rahul on October 5, 2015 1:00 AM at 1:00 am said:
      
      @Jerzy
      
      Thanks for the tips about the books. Yes, I sure would like to read some research backed recommendations.
      
      Somehow, most exhortations I come across about graphs have very little actual basis in empiricism.
    - Jerzy on October 5, 2015 6:19 PM at 6:19 pm said:
      
      Andrew: glad to hear they’ll be reading Cleveland. Looks like a great syllabus.
      Just curious — did you ever get a chance to read Kosslyn yourself, and not just my review? Apart from that one section you hated (and his view of error bars), there’s plenty of good stuff too. But I agree Cleveland’s better for your students.
      
      Rahul: agreed!
    - Rahul on October 3, 2015 12:41 PM at 12:41 pm said:
      
      Andrew:
      
      Which raises the question: Isn’t measurement a necessary step before coming up with a solution to a problem?
      
      i.e. If we cannot even measure the communication problem in any meaningful way, how good can we do at fixing it?
Louis on October 5, 2015 9:48 AM at 9:48 am said:

Hi Andrew,

It looks really nice. But I am wondering what class size did you have before and what class size are you aiming for. The approach seems only feasible for relative small classes or am I wrong?

Louis

Reply ↓
david condon on October 6, 2015 8:42 PM at 8:42 pm said:

Alternative titles:

Visually Communicating Data
The Graphic Design of Scientific Analysis
Visual Research Communication
Visual Design in Data Science
Job Skills for High-paying Consulting Firms

Reply ↓
- Andrew on October 6, 2015 8:56 PM at 8:56 pm said:
  
  David:
  
  But about half the class has nothing to do with visualization!
  
  Reply ↓
Steve on October 7, 2015 11:48 AM at 11:48 am said:

This looks fantastic!

But it’s time to do away with Strunk and White! Linguist Geoffrey Pullum explains it best:

“50 Years of Stupid Grammar Advice”
http://chronicle.com/article/50-Years-of-Stupid-Grammar/25497

Reply ↓
- Andrew on October 7, 2015 12:54 PM at 12:54 pm said:
  
  Steve:
  
  Please take a look at the readings for class 10b.
  
  Reply ↓
just us chickens on January 27, 2016 5:36 PM at 5:36 pm said:

Dead link on p.22#7. From archive.org:
Dean Baker, Influencing the Debate from Outside the Mainstream: Keep it Simple

Reply ↓
- Andrew on January 27, 2016 5:55 PM at 5:55 pm said:
  
  Thanks!
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Syllabus for my course on Communicating Data and Statistics

41 thoughts on “Syllabus for my course on Communicating Data and Statistics”

Leave a Reply Cancel reply