Inspiring story from a chemistry classroom

From former chemistry teacher HildaRuth Beaumont:

I was reminded of my days as a newly qualified teacher at a Leicestershire comprehensive school in the 1970s, when I was given a group of reluctant pupils with the instruction to ‘keep them occupied’. After a couple of false starts we agreed that they might enjoy making simple glass ornaments. I knew a little about glass blowing so I was able to teach them how to combine coloured and transparent glass to make animal figures and Christmas tree decorations. Then one of them made a small bottle complete with stopper. Her classmate said she should buy some perfume, pour some of it into the bottle and give it to her mum as a Mother’s Day gift. ‘We could actually make the perfume too,’ I said. With some dried lavender, rose petals, and orange and lemon peel, we applied solvent extraction and steam distillation to good effect and everyone was able to produce small bottles of perfume for their mothers.

What a wonderful story. We didn’t do anything like this in our high school chemistry classes! Chemistry 1 was taught by an idiot who couldn’t understand the book he was teaching out of. Chemistry 2 was taught with a single-minded goal of teaching us how to solve the problems on the Advanced Placement exam. We did well on the exam and learned essentially zero chemistry. On the plus side, this allowed me to place out of the chemistry requirement in college. On the minus side . . . maybe it would’ve been good for me to learn some chemistry in college. I don’t remember doing any labs in Chemistry 2 at all!

“On the uses and abuses of regression models: a call for reform of statistical practice and teaching”: We’d appreciate your comments . . .

John Carlin writes:

I wanted to draw your attention to a paper that I’ve just published as a preprint: On the uses and abuses of regression models: a call for reform of statistical practice and teaching (pending publication I hope in a biostat journal). You and I have discussed how to teach regression on a few occasions over the years, but I think with the help of my brilliant colleague Margarita Moreno-Betancur I have finally figured out where the main problems lie – and why a radical rethink is needed. Here is the abstract:

When students and users of statistical methods first learn about regression analysis there is an emphasis on the technical details of models and estimation methods that invariably runs ahead of the purposes for which these models might be used. More broadly, statistics is widely understood to provide a body of techniques for “modelling data”, underpinned by what we describe as the “true model myth”, according to which the task of the statistician/data analyst is to build a model that closely approximates the true data generating process. By way of our own historical examples and a brief review of mainstream clinical research journals, we describe how this perspective leads to a range of problems in the application of regression methods, including misguided “adjustment” for covariates, misinterpretation of regression coefficients and the widespread fitting of regression models without a clear purpose. We then outline an alternative approach to the teaching and application of regression methods, which begins by focussing on clear definition of the substantive research question within one of three distinct types: descriptive, predictive, or causal. The simple univariable regression model may be introduced as a tool for description, while the development and application of multivariable regression models should proceed differently according to the type of question. Regression methods will no doubt remain central to statistical practice as they provide a powerful tool for representing variation in a response or outcome variable as a function of “input” variables, but their conceptualisation and usage should follow from the purpose at hand.

The paper is aimed at the biostat community, but I think the same issues apply very broadly at least across the non-physical sciences.

Interesting. I think this advice is roughly consistent with what Aki, Jennifer, and I say and do in our books Regression and Other Stories and Active Statistics.

More specifically, my take on teaching regression is similar to what Carlin and Moreno say, with the main difference being that I find that students have a lot of difficulty understanding plain old mathematical models. I spend a lot of time teaching the meaning of y = a + bx, how to graph it, etc. I feel that most regression textbooks focus too much on the error term and not enough on the deterministic part of the model. Also, I like what we say on the first page of Regression and Other Stories, about the three tasks of statistics being generalizing from sample to population, generalizing from control to treatment group, and generalizing from observed data to underlying constructs of interest. I think models are necessary for all three of these steps, so I do think that understanding models is important, and I’m not happy with minimalist treatments of regression that describe it as a way of estimating conditional expectations.

The first of these tasks is sampling inference, the second is causal inference, and the third refers to measurement. Statistics books (including my own) spend lots of time on sampling and causal inference, not so much on measurement. But measurement is important! For an example, see here.

If any of you have reactions to Carlin and Moreno’s paper, or if you have reactions to my reactions, please share them in comments, as I’m sure they’d appreciate it.

Our new book, Active Statistics, is now available!

Coauthored with Aki Vehtari, this new book is lots of fun, perhaps the funnest I’ve ever been involved in writing. And it’s stuffed full of statistical insights. The webpage for the book is here, and the link to buy it is here or directly from the publisher here.

With hundreds of stories, activities, and discussion problems on applied statistics and causal inference, this book is a perfect teaching aid, a perfect adjunct to a self-study program, and an enjoyable bedside read if you already know some statistics.

Here’s the quick summary:

This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore the real-world complexity of the subject. The book fosters an engaging “flipped classroom” environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors’ previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.

It’s got 52 of everything because it’s structured around a two-semester class, with 13 weeks per semester and 2 classes per week. It’s really just bursting with material, including some classic stories and lots of completely new material. Right off the bat we present a statistical mystery that arose with a Wikipedia experiment, and we have a retelling of the famous Literary Digest survey story but with a new and unexpected twist (courtesy of Sharon Lohr and J. Michael Brick). And the activities have so much going on! One of my favorites is a version of the two truths and a lie game that demonstrates several different statistical ideas.

People have asked how this differs from my book with Deborah Nolan, Teaching Statistics: A Bag of Tricks. My quick answer is that Active Statistics has a lot more of everything, it’s structured to cover an entire two-semester course in order, and it’s focused on applied statistics. Including a bunch of stories, activities, demonstrations, and problems on causal inference, a topic that is not always so well integrated into the statistics curriculum. You’re gonna love this book.

You can buy it here or here. It’s only 25 bucks, which is an excellent deal considering how stuffed it is with useful content. Enjoy.

The four principles of Barnard College: Respect, empathy, kindness . . . and censorship?

A few months ago we had Uh oh Barnard . . .

And now there’s more:

Barnard is mandating that students remove any items affixed to room or suite doors by Feb. 28, after which point the college will begin removing any remaining items, Barnard College Dean Leslie Grinage announced in a Friday email to the Barnard community. . . .

“We know that you have been hearing often lately about our community rules and policies. And we know it may feel like a lot,” Grinage wrote. “The goal is to be as clear as possible about the guardrails, and, meeting the current moment, do what we can to support and foster the respect, empathy and kindness that must guide all of our behavior on campus.”

According to the student newspaper, here’s the full email from the Barnard dean:

Dear Residential Students,

The residential experience is an integral part of the Barnard education. Our small campus is a home away from home for most of you, and we rely on each other to help foster an environment where everyone feels welcome and safe. This is especially important in our residential spaces. We encourage debate and discussion and the free exchange of ideas, while upholding our commitment to treating one another with respect, consideration and kindness. In that spirit, I’m writing to remind you of the guardrails that guide our residential community — our Residential Life and Housing Student Guide.

While many decorations and fixtures on doors serve as a means of helpful communication amongst peers, we are also aware that some may have the unintended effect of isolating those who have different views and beliefs. So, we are asking everyone to remove any items affixed to your room and/or suite doors (e.g. dry-erase boards, decorations, messaging) by Wednesday, February 28 at noon; the College will remove any remaining items starting Thursday, February 29. The only permissible items on doors are official items placed by the College (e.g. resident name tags). (Those requesting an exemption for religious or other reasons should contact Residential Life and Housing by emailing [email protected].)

We know that you have been hearing often lately about our community rules and policies. And we know it may feel like a lot. The goal is to be as clear as possible about the guardrails, and, meeting the current moment, do what we can to support and foster the respect, empathy and kindness that must guide all of our behavior on campus.

The Residential Life and Housing team is always here to support you, and you should feel free to reach out to them with any questions you may have.

Please take care of yourselves and of each other. Together we can build an even stronger Barnard community.

Sincerely,

Leslie Grinage

Vice President for Campus Life and Student Experience and Dean of the College

The dean’s letter links to this Residential Life and Housing Student Guide, which I took a look at. It’s pretty reasonable, actually. All I saw regarding doors was this mild restriction:

While students are encouraged to personalize their living space, they may not alter the physical space of the room, drill or nail holes into any surface, or affix tapestries and similar decorations to the ceiling, light fixtures, or doorways. Painting any part of the living space or college-supplied furniture is also prohibited.

The only thing in the entire document that seemed objectionable was the no-sleeping-in-the-lounges policy, but I can’t imagine they would enforce that rule unless someone was really abusing the privilege. They’re not gonna send the campus police to wake up a napper.

So, yeah, they had a perfectly reasonable rulebook and then decided to mess it all up by not letting the students decorate their doors. So much for New York, center of free expression.

I assume what’s going on here is that Barnard wants to avoid the bad publicity that comes from clashes between groups of students with opposing political views. And now they’re getting bad publicity because they’re censoring students’ political expression.

The endgame seems to be to turn the college to some sort of centrally-controlled corporate office park. But that wouldn’t be fair. In a corporate office, they let you decorate your own cubicle, right?

Hand-drawn Statistical Workflow at Nelson Mandela

In September 2023 I taught a week-long course on statistical workflow at the Nelson Mandela African Institution of Science and Technology (NM-AIST), a public postgraduate research university in Arusha, Tanzania established in 2009.

NM-AIST – CENIT@EA

The course was hosted by Dean Professor Ernest Rashid Mbega and the Africa Centre for Research, Agricultural Advancement, Teaching Excellence and Sustainability (CREATES) through the Leader Professor Hulda Swai and Manager Rose Mosha.

Our case study was an experiment on the NM-AIST campus designed and implemented by Dr Arjun Potter and Charles Luchagula to study the effects of drought, fire, and herbivory on growth of various acacia tree species. The focus was pre-data workflow steps, i.e. experimental design. The goal for the week was to learn some shared statistical language so that scientists can work with statisticians on their research.

Together with Arjun and Charles, with input from Drs Emmanuel Mpolya, Anna Treydte, Andrew Gelman, Michael Betancourt, Avi Feller, Daphna Harel, and Joe Blitzstein, I created course materials full of activities. We asked participants to hand-draw the experimental design and their priors, working together with their teammates. We also did some pencil-and-paper math and some coding in R.

Course participants were students and staff from across NM-AIST. Over the five days, between 15 and 25 participants attended on a given day.

Using the participants’ ecological expertise, we built a model to tell a mathematical story of how acacia tree height could vary by drought, fire, herbivory, species, and plot location. We simulated parameters and data from this model, e.g. beta_fire = rnorm(n = 1, mean = -2, sd = 1) then simulated_data …= rnorm(n, beta_0 + beta_fire*Fire +… beta_block[Block], sd_tree). We then fit the model to the simulated data.

Due to difficulty in manipulating fire, fire was assigned at the block-level, whereas drought and herbivory were assigned at the sub-block level. We saw how this reduced precision in estimating the effect of fire:

We redid the simulation assuming a smaller block effect and saw improved precision. This confirmed the researcher’s intuitions that they need to work hard to reduce the block-to-block differences.

To keep the focus on concepts not code, we only simulated once from the model. A full design analysis would include many simulations from the model. In Section 16.6 of ROS they fix one value for the parameters and simulate multiple datasets. In Gelman and Carlin (2014) they consider a range of plausible parameters using prior information. Betancourt’s workflow simulates parameters from the prior.

Our course evaluation survey was completed by 14 participants. When asked “which parts of the class were most helpful to you to understand the concepts?”, respondents chose instructor explanations, drawings, and activities as more helpful than the R code. However, participants also expressed eagerness to learn R and to analyze the real data in our next course.

The hand-drawn course materials and activities were inspired by Brendan Leonard’s illustrations in Bears Don’t Care About Your Problems and I Hate Running and You Can Too. Brendan wrote me,

I kind of think hand-drawing stuff makes it more fun and also maybe less intimidating?

I agree.

More recently, I have been reading Introduction to Modern Causal Inference by Alejandro Schuler and Mark van der Laan, who say

It’s easy to feel like you don’t belong or aren’t good enough to participate…

yup.

To deal with that problem, the voice we use throughout this book is informal and decidedly nonacademic…Figures are hand-drawn and cartoonish.

I’m excited to return to NM-AIST to continue the workflow steps with the data that Dr Arjun Potter and Charles Luchagula have been collecting. With the real data, we can ask: is our model realistic enough to achieve our scientific goals ?

It’s bezzle time: The Dean of Engineering at the University of Nevada gets paid $372,127 a year and wrote a paper that’s so bad, you can’t believe it.

“As we look to sleep and neuroscience for answers we can study flies specifically the Drosophila melanogaster we highlight in our research.”

1. The story

Someone writes:

I recently read a paper of yours in the Chronicle about how academic fraudsters get away with it. I came across a strange case that I thought you would at least have some interest in when a faculty members owns an open access journal that costs to publish and then publishes a large number of papers in the journal.  The most recent issue is all from the same authors (family affair).

It is from an administrator at University of Nevada Reno.  This concern is related to publications within a journal that may not be reputable.   The Dean of Engineering has a number of publications in the International Supply Chain Technology Journal that are in question Google Scholar.  Normally, I would contact the editor, or publisher, but in this case, there are complexities.

This may not  be an issue but many of the articles are short, being 1 or 2 pages. In addition, some have a peer review process of 3 days or less. Another concern is that many of the papers do not even discuss what is in the title.  Take the following paper: It presents nothing about the title. Many of the papers read as if AI was used.

While the quality of these papers may not be of concern, the representation of these as publications could be. The person publishing them should have ethical standards that exceed those that are under his leadership. He is also the highest ranking official of the college of engineering and is expected to lead by example and be a good model to those under him.

If that is not enough, looking into the journal in more detail alludes to more ethical questions. The journal is published by PWD Group out of Texas. Lookup of PWD Group out of Texas yields that Erick Jones is the Director and President.  Erick Jones was also the Editor of the journal.  In addition to the journal articles, even books authored by Erick Jones are published by PWD.

Further looking into the journal publications you will see that there are a large number with Erick Jones Sr. and Erick Jones Jr.  There are also a large number with Felicia Jefferson.  Felicia is also a faculty member at UNR and the spouse of Dean Jones.  A few of the papers raise concerns related to deer supply chains. The following has a very fast peer review process of a few days and the caption of a white tailed deer is a reindeer. Another paper is even shorter, with a very fast peer review, and captions yet a different deer which is still not a white tail. It is unlikely these papers went through a robust peer review.

While these papers affiliation are prior to coming to UNR, the incoherence, conflict of interest, and incorrect data do lot look good for UNR and they were published either when Dr. Jefferson was applying to UNR or early upon her arrival. Similar issues with the timing of this article. Also, in the print version of the journal, Dr. Jefferson handles submissions (pp3).

Maybe this information is nothing to be concerned about.  At the very least, it sheds a poor light on the scientific process, especially when a Dean is the potential abuser.  It is not clear how he can encourage high quality manuscripts from other faculty when he has been able to climb the ladder using his own publishing house. I’ll leave you with a paper with a relevant title on minimizing train accidents through minimizing sleep deprivation. It seems like a really important study.  The short read should convince you otherwise and make you question the understanding of the scientific process by these authors.

Of specific concern is whether these publications led to he, or his spouse, being hired at UNR.  If these are considered legitimate papers, the entire hiring and tenure process at UNR is compromised.  Similar arguments exist if these papers are used in the annual evaluation process. It also raises a conflict of interest if he pays to publish and then receives proceeds on the back end.

I have no comment on the hiring, tenure, and evaluation process at UNR, or on any conflicts of interest. I know nothing about what is going on at UNR. It’s a horrifying story, though.

2. The published paper

OK, here it is, in its entirety (except for references). You absolutely have to see it to believe it:

Compared to this, the Why We Sleep guy is a goddamn titan of science.

3. The Dean of Engineering

From the webpage of the Dean of Engineering at the University of Reno:

Dr. Erick C. Jones is a former senior science advisor in the Office of the Chief Economist at the U.S. State Department. He is a former professor and Associate Dean for Graduate Studies at the College of Engineering at The University of Texas at Arlington.

From the press release announcing his appointment, dated July 01, 2022:

Jones is an internationally recognized researcher in industrial manufacturing and systems engineering. . . . “In Erick Jones, our University has a dynamic leader who understands how to seize moments of opportunity in order to further an agenda of excellence,” University President Brian Sandoval said. . . . Jones was on a three-year rotating detail at National Science Foundation where he was a Program Director in the Engineering Directorate for Engineering Research Centers Program. . . .

Jones is internationally recognized for his pioneering work with Radio Frequency Identification (RFID) technologies, Lean Six Sigma Quality Management (the understanding of whether a process is well controlled), and autonomous inventory control. He has published more than 243 manuscripts . . .

According to this source, his salary in 2022 was $372,127.

According to wikipedia, UNR is the state’s flagship public university.

I was curious to see what else Jones had published so I searched him on Google scholar and took a look at his three most-cited publications. The second of these appeared to be a textbook, and the third was basically 8 straight pages of empty jargon—ironic that a journal called Total Quality Management would publish something that has no positive qualities! The most-cited paper on the list was pretty bad too, an empty bit of make-work, the scientific equivalent of the reports that white-collar workers need to fill out and give to their bosses who can then pass these along to their bosses to demonstrate how productive they are. In short, this guy seems to be a well-connected time server in the Ed Wegman mode, minus the plagiarism.

He was a Program Director at the National Science Foundation! Your tax dollars at work.

Can you imagine what it would feel like to be a student in the engineering school at the flagship university of the state of Nevada, and it turns out the school is being run by the author of this:

Our recent study has the premise that both humans and flies sleep during the night and are awake during the day, and both species require a significant amount of sleep each day when their neural systems are developing in specific activities. This trait is shared by both species. An investigation was segmented into three subfields, which were titled “Life span,” “Time-to-death,” and “Chronological age.” In D. melanogaster, there was a positive correlation between life span, the intensity of young male medflies, and the persistence of movement. Time-to-death analysis revealed that the male flies passed away two weeks after exhibiting the supine behavior. Chronological age, activity in D. melanogaster was adversely correlated with age; however, there was no correlation between chronological age and time-to-death. It is probable that the incorporation the findings of age-related health factors and increased sleep may lead toless train accidents. of these age factors when considering these options supply chain procedure for maintaining will be beneficial.

I can’t even.

P.S. The thing I still can’t figure out is, why did Jones publish this paper at all? He’d already landed the juicy Dean of Engineering job, months before submitting it to his own journal. To then put his name on something so ludicrously bad . . . it can’t help his career at all, could only hurt. And obviously it’s not going to do anything to reduce train accidents. What was he possibly thinking?

P.P.S. I guess this happens all the time; it’s what Galbraith called the “bezzle.” We’re just more likely to hear about when it happens at some big-name place like Stanford, Harvard, Ohio State, or Cornell. It still makes me mad, though. I’m sure there are lots of engineers who are doing good work and could be wonderful teachers, and instead UNR spends $372,127 on this guy.

I’ll leave the last word to another UNR employee, from the above-linked press release:

“What is exciting about having Jones as our new dean for the College of Engineering is how he clearly understands the current landscape for what it means to be a Carnegie R1 ‘Very High Research’ institution,” Provost Jeff Thompson said. “He very clearly understands how we can amplify every aspect of our College of Engineering, so that we can continue to build transcendent programs for engineering education and research.”

They’re transcending something, that’s for sure.

My challenge for Jeff Thompson: Show up at an engineering class at your institution, read aloud the entire contents (i.e., the two paragraphs) of “Using Science to Minimize Sleep Deprivation that may reduce Train Accidents,” then engage the students in a discussion of what this says about “the current landscape for what it means to be a Carnegie R1 ‘Very High Research’ institution.”

Should be fun, no? Just remember, the best way to keep the students’ attention is to remind them that, yes, this will be covered on the final exam.

P.P.P.S. More here from Retraction Watch.

P.P.P.P.S. Still more here.

P.P.P.P.P.S. Retraction Watch found more plagiarism, this time on a report for the National Science Foundation.

Fun with Dååta: Reference librarian edition

Rasmuth Bååth reports the following fun story in a blog post, The source of the cake dataset (it’s a hierarchical modeling example included with the R package lme4).

Rasmuth writes,

While looking for a dataset to illustrate a simple hierarchical model I stumbled upon another one: The cake dataset in the lme4 package which is described as containing “data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures [as] presented in Cook (1938).

The search is on.

… after a fair bit of flustered searching, I realized that this scholarly work, despite its obvious relevance to society, was nowhere to be found online.

The plot thickens like cake batter until Megan N. O’Donnell, a reference librarian (officially, Research Data Services Lead!) at Iowa State, the source of the original, gets involved. She replies to Rasmuth’s query,

Sorry for the delay — I got caught up in a deadline. The scan came out fairly well, but page 16 is partially cut off. I’ll put in a request to have it professionally scanned, but that will take some time. Hopefully this will do for now.

Rasmuth concludes,

She (the busy Research Data Services Lead with a looming deadline) is apologizing to me (the random Swede with an eccentric cake thesis digitization request) that it took a few days to get me everything I asked for!?

Reference librarians are amazing! Read the whole story and download the actual manuscript from Rasmuth’s original blog post. The details of the experimental design are quite interesting, including the device used to measure cake breakage angle, a photo of which is included in the post.

I think it’d be fun to organize a class around generating new, small scale and amusing data sets like this one. Maybe it sounds like more fun than it would actually be—data collection is grueling. Andrew says he’s getting tired of teaching data communication, and he’s been talking a lot more about the importance of data collection on the blog, so maybe next year…

P.S. In a related note, there’s something called a baath cake that’s popular in Goa and confused my web search.

Resources for teaching and learning survey sampling, from Scott Keeter at Pew Research

Art Owen informed me that he’ll be teaching sampling again at Stanford, and he was wondering about ideas for students gathering their own data.

I replied that I like the idea of sampling from databases, biological sampling, etc. You can point out to students that a “blood sample” is indeed a sample!

Art replied:

Your blood example reminds me that there is a whole field (now very old) on bulk sampling. People sample from production runs, from cotton samples, from coal samples and so on. Widgets might get sampled from the beginning, middle and end of the run. David Cox wrote some papers on sampling to find the quality of cotton as measured by fiber length. The process is to draw a blue line across the sample and see the length of fibers that intersect the line. This gives you a length-biased sample that you can nicely de-bias. There’s also an interesting out there about tree sampling, literally on a tree, where branches get sampled at random and fruit is counted. I’m not sure if it’s practical.

Last time I found an interesting example where people would sample ocean tracts to see if there was a whale. If they saw one, they would then sample more intensely in the neighboring tracts. Then the trick was to correct for the bias that brings. It’s in the Sampling book by S. K. Thompson. There are also good mark-recapture examples for wildlife.

I hesitate to put a lot of regression in a sampling class; It is all too easy for every class to start looking like a regression/prediction/machine learning class. We need room for the ideas about where and how data arises and it’s too easy to crowd those out by dwelling on the modeling ideas.

I’ll probably toss in some space-filling sampling plans and other ways to down size data sets as well.

The old Cochran style was: get an estimator, show it is unbiased, find an expression for its variance, find an estimate of that variance, show this estimate is unbiased and maybe even find and compare variances of several competing variance estimates. I get why he did it but it can get dry. I include some of that but I don’t let it dominate the course. Choices you can make and their costs are more interesting.

I connected Art to Scott Keeter at Pew Research, who wrote:

Fortunately, we are pretty diligent about keeping track of what we do and writing it up. The examples below have lengthy methodology sections and often there is companion material (such as blog posts or videos) about the methodological issues.

We do not have a single overview methodological piece about this kind of work but the next best thing is a great lecture that Courtney Kennedy gave at the University of Michigan last year, walking through several of our studies and the considerations that went into each one:

Here are some links to good examples, with links to the methods sections or extra features:

Our recent study of Jewish Americans, the second one we’ve done. We switched modes for this study (thus different sampling strategy), and the report materials include an analysis of mode differences https://www.pewresearch.org/religion/2021/05/11/jewish-americans-in-2020/

Appendix A: Survey methodology

Jewish Americans in 2020: Answers to frequently asked questions

Our most recent survey of the US Muslim population:

U.S. Muslims Concerned About Their Place in Society, but Continue to Believe in the American Dream


A video on the methods:
https://www.pewresearch.org/fact-tank/2017/08/16/muslim-americans-methods/

This is one of the most ambitious international studies we’ve done:

Religion in India: Tolerance and Segregation


Here’s a short video on the sampling and methodology:
https://www.youtube.com/watch?v=wz_RJXA7RZM

We then had a quick email exchange:

Me: Thanks. Post should appear in Aug.

Scott: Thanks. We’ll probably be using sampling by spaceship and data collection with telepathy by then.

Me: And I’ll be charging the expenses to my NFT.

In a more serious vein, Art looked into Scott’s suggestions and followed up:

I [Art] looked at a few things at the Pew web-site. The quality of presentation is amazingly good. I like the discussions of how you identify who to reach out to. Also the discussion of how to pose the gender identity question is something that I think would interest students. I saw some of the forms and some of the data on response rates. I also found Courtney Kennedy’s video on non-probability polls. I might avoid religious questions for in-depth followup in class. Or at least, I would have to be careful in doing it, so nobody feels singled out.

Where could I find some technical documents about the American Trends Panel? I would be interested to teach about sample reweighting, e.g., raking and related methods, as it is done for real.

I’m wondering about getting survey data for a class. I might not be able to require them to get a Pew account and then agree to terms and conditions. Would it be reasonable to share a downsampled version of a Pew data set with a class? Something about attitudes to science would be interesting for students.

To which Scott replied:

Here is an overview I wrote about how the American Trends Panel operates and how it has changed over time in response to various challenges:

Growing and Improving Pew Research Center’s American Trends Panel

This relatively short piece provides some good detail about how the panel works:
https://www.pewresearch.org/fact-tank/2021/09/07/how-do-people-in-the-u-s-take-pew-research-center-surveys-anyway/

We use the panel to conduct lots of surveys, but most of them are one-off efforts. We do make an effort to track trends over time, but that’s usually the way we used to do it when we conducted independent sample phone surveys. However, we sometimes use the panel as a panel – tracking individual-level change over time. This piece explains one application of that approach:
https://www.pewresearch.org/fact-tank/2021/01/20/how-we-know-the-drop-in-trumps-approval-rating-in-january-reflected-a-real-shift-in-public-opinion/

When we moved from mostly phone surveys to mostly online surveys, we wanted to assess the impact of the change in mode of interview on many of our standard public opinion measures. This study was a randomized controlled experiment to try to isolate the impact of mode of interview:

From Telephone to the Web: The Challenge of Mode of Interview Effects in Public Opinion Polls

Survey panels have some real benefits but they come with a risk – that panelists change as a result of their participation in the panel and no longer fully resemble the naïve population. We tried to assess whether that is happening to our panelists:

Measuring the Risks of Panel Conditioning in Survey Research

We know that all survey samples have biases, so we weight to try to correct those biases. This particularly methodology statement is more detailed than is typical and gives you some extra insight into how our weighting operates. Unfortunately, we do not have a public document that breaks down every step in the weighting process:

Methodology

Most of our weighting parameters come from U.S. government surveys such as the American Community Survey and the Current Population Survey. But some parameters are not available on government surveys (e.g., religious affiliation) so we created our own higher quality survey to collect some of these for weighting:

How Pew Research Center Uses Its National Public Opinion Reference Survey (NPORS)

This one is not easy to find on our website but it’s a good place to find wonky methodological content, not just about surveys but about our big data projects as well:

Home


We used to publish these through Medium but decided to move them in-house.

By the way, my colleagues in the survey methods group have developed an R package for the weighting and analysis of survey data. This link is to the explainer for weighting data but that piece includes links to explainers about the basic analysis package:
https://www.pewresearch.org/decoded/2020/03/26/weighting-survey-data-with-the-pewmethods-r-package/

Lots here to look at!

It’s been awhile since I’ve taught a course on survey sampling. I used to teach such a course—it was called Design and Analysis of Sample Surveys—and I enjoyed it. But . . . in the class I’d always have to spend some time discussing basic statistics and regression modeling, and this always was the part of the class that students found the most interesting! So I eventually just started teaching statistics and regression modeling, which led to my Regression and Other Stories book. The course I’m now teaching out of that book is called Applied Regression and Causal Inference. I still think survey sampling is important; it was just hard to find an audience for the course.

Here’s how to subscribe to our new weekly newsletter:

Just a reminder: we have a new weekly newsletter. We posted on it a couple weeks ago; I’m just giving a reminder here because the goal of the newsletter is to reach people who wouldn’t otherwise go online to read the blog.

Subscribing is free, and then in your inbox each Monday morning you’ll get a list of our scheduled posts for the forthcoming week, along with links to the past week’s posts. Enjoy.

P.S. To subscribe, click on the link and follow the instructions from there.

Learning from mistakes (my online talk for the American Statistical Association, 2:30pm Tues 30 Jan 2024)

Here’s the link:

Learning from mistakes

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

We learn so much from mistakes! How can we structure our workflow so that we can learn from mistakes more effectively? I will discuss a bunch of examples where I have learned from mistakes, including data problems, coding mishaps, errors in mathematics, and conceptual errors in theory and applications. I will also discuss situations where researchers have avoided good learning opportunities. We can then try to use all these cases to develop some general understanding of how and when we learn from errors in the context of the fractal nature of scientific revolutions.

The video is here.

It’s sooooo frustrating when people get things wrong, the mistake is explained to them, and they still don’t make the correction or take the opportunity to learn from their mistakes.

To put it another way . . . when you find out you made a mistake, you learn three things:

1. Now: Your original statement was wrong.

2. Implications for the future: Beliefs and actions that flow from that original statement may be wrong. You should investigate your reasoning going forward and adjust to account for your error.

3. Implications for the past: Something in your existing workflow led to your error. You should trace your workflow, see how that happened, and alter your workflow accordingly.

In poker, they say to evaluate the strategy, not the play. In quality control, they say to evaluate the process, not the individual outcome. Similarly with workflow.

As we’ve discussed many many times in this space (for example, here), it makes me want to screeeeeeeeeeam when people forego this opportunity to learn. Why do people, sometimes very accomplished people, give up this opportunity? I’m speaking here of people who are trying their best, not hacks and self-promoters.

The simple answer for why even honest people will avoid admitting clear mistakes is that it’s embarrassing for them to admit error, they don’t want to lose face.

The longer answer, I’m afraid, is that at some level they recognize issues 1, 2, and 3 above, and they go to some effort to avoid confronting item 1 because they really really don’t want to face item 2 (their beliefs and actions might be affected, and they don’t want to hear that!) and item 3 (they might be going about everything all wrong, and they don’t want to hear that either!).

So, paradoxically, the very benefits of learning from error are scary enough to some people that they’ll deny or bury their own mistakes. Again, I’m speaking here of otherwise-sincere people, not of people who are willing to lie to protect their investment or make some political point or whatever.

In my talk, I’ll focus on my own mistakes, not those of others. My goal is for you in the audience to learn how to improve your own workflow so you can catch errors faster and learn more from them, in all three senses listed above.

P.S. Planning a talk can be good for my research workflow. I’ll get invited to speak somewhere, then I’ll write a title and abstract that seems like it should work for that audience, then the existence of this structure gives me a chance to think about what to say. For example, I’d never quite thought of the three ways of learning from error until writing this post, which in turn was motivated by the talk coming up. I like this framework. I’m not claiming it’s new—I guess it’s in Pólya somewhere—, just that it will help my workflow. Here’s another recent example of how the act of preparing an abstract helped me think about a topic of continuing interest to me.

Progress in 2023, Charles edition

Following the examples of Andrew, Aki, and Jessica, and at Andrew’s request:

Published:

Unpublished:

This year, I also served on the Stan Governing Body, where my primary role was to help bring back the in-person StanCon. StanCon 2023 took place at the University of Washington in St. Louis, MO and we got the ball rolling for the 2024 edition which will be held at Oxford University in the UK.

It was also my privilege to be invited as an instructor at the Summer School on Advanced Bayesian Methods at KU Leuven, Belgium and teach a 3-day course on Stan and Torsten, as well as teach workshops at StanCon 2023 and at the University of Buffalo.

Postdoc at Washington State University on law-enforcement statistics

This looks potentially important:

The Center for Interdisciplinary Statistical Education and Research (CISER) at Washington State University (WSU) is excited to announce that it has an opening for a Post-Doctoral Research Associate (statistical scientist) supporting a new state-wide public data project focused on law enforcement. The successful candidate will be part of a team of researchers whose mission is to modernize public safety data collection through standardization, automation, and evaluation. The project will actively involve law enforcement agencies, state and local policymakers, researchers, and the public in data exploration and discovery. This effort will be accomplished in part by offering education and training opportunities fostering community-focused policing and collaborative learning sessions. The statistical scientist in this role will develop comprehensive educational materials, workshops, online courses, and training manuals designed to equip and empower law enforcement agencies, state and local policymakers, researchers, and the public with data and statistical literacy skills that will enable them to maximize the utility of the data project.

Data, education, and policy. Interesting.

Bayesian BATS to advance Bayesian Thinking in STEM

Mine Dogucu writes:

We are recruiting our new cohort of STEM instructors who are interested in incorporating Bayesian thinking and methods in their teaching in US universities and colleges.

Please help us spread the word.

stat.uci.edu/bayes-bats/

Our goal is to advance Bayesian Thinking in STEM, hence the name BATS.

BATS is a three-tiered program

  • The first tier of the program consists of a week-long instructor training bootcamp (on the west coast at University of California Irvine in Summer 2023, and on the east coast at Vassar College in Summer 2024), to build a diverse community of Bayesian educators across different STEM fields.
  • In the second tier of the project, selected instructors will develop Bayesian teaching and learning materials specifically using scientific data from their fields with the support of the PIs, during the fall semester after their summer boot camp training participation.
  • In the third tier of the project, selected instructors will disseminate the teaching and learning materials through conferences and publications with the support of the PIs.

The BATS Project Objectives are as follows:

  • Increase the number of undergraduate students who are exposed to Bayesian methods;
  • Enhance the capacity of STEM instructors in Bayesian methods through training and community building;
  • Develop and enrich teaching and learning materials that showcase use of Bayesian methods in STEM fields

Our new Substack newsletter: The Future of Statistical Modeling!

Some people told me that it would be easier to follow the blog if it were available in newsletter form. So we set up a Substack newsletter called The Future of Statistical Modeling.

Each week will give a list of the posts scheduled for the upcoming week at our blog, along with links to the previous week’s posts.

To make the newsletter more appealing, I’ll also sometimes post other stuff there, such as entire future posts, so then if you subscribe to the newsletter you get access to some fun stuff that you otherwise might not see for months.

The Substack newsletter is free, just a convenience for you, and a way for us to broaden our reach.

Ben Shneiderman’s Golden Rules of Interface Design

The legendary computer science and graphics researcher writes:

1. Strive for consistency.

Consistent sequences of actions should be required in similar situations; identical terminology should be used in prompts, menus, and help screens; and consistent color, layout, capitalization, fonts, and so on, should be employed throughout. Exceptions, such as required confirmation of the delete command or no echoing of passwords, should be comprehensible and limited in number.

2. Seek universal usability.

Recognize the needs of diverse users and design for plasticity, facilitating transformation of content. Novice to expert differences, age ranges, disabilities, international variations, and technological diversity each enrich the spectrum of requirements that guides design. Adding features for novices, such as explanations, and features for experts, such as shortcuts and faster pacing, enriches the interface design and improves perceived quality.

3. Offer informative feedback.

For every user action, there should be an interface feedback. For frequent and minor actions, the response can be modest, whereas for infrequent and major actions, the response should be more substantial. Visual presentation of the objects of interest provides a convenient environment for showing changes explicitly.

4. Design dialogs to yield closure.

Sequences of actions should be organized into groups with a beginning, middle, and end. Informative feedback at the completion of a group of actions gives users the satisfaction of accomplishment, a sense of relief, a signal to drop contingency plans from their minds, and an indicator to prepare for the next group of actions. For example, e-commerce websites move users from selecting products to the checkout, ending with a clear confirmation page that completes the transaction.

5. Prevent errors.

As much as possible, design the interface so that users cannot make serious errors; for example, gray out menu items that are not appropriate and do not allow alphabetic characters in numeric entry fields. If users make an error, the interface should offer simple, constructive, and specific instructions for recovery. For example, users should not have to retype an entire name-address form if they enter an invalid zip code but rather should be guided to repair only the faulty part. Erroneous actions should leave the interface state unchanged, or the interface should give instructions about restoring the state.

6. Permit easy reversal of actions.

As much as possible, actions should be reversible. This feature relieves anxiety, since users know that errors can be undone, and encourages exploration of unfamiliar options. The units of reversibility may be a single action, a data-entry task, or a complete group of actions, such as entry of a name-address block.

7. Keep users in control.

Experienced users strongly desire the sense that they are in charge of the interface and that the interface responds to their actions. They don’t want surprises or changes in familiar behavior, and they are annoyed by tedious data-entry sequences, difficulty in obtaining necessary information, and inability to produce their desired result.

8. Reduce short-term memory load.

Humans’ limited capacity for information processing in short-term memory (the rule of thumb is that people can remember “seven plus or minus two chunks” of information) requires that designers avoid interfaces in which users must remember information from one display and then use that information on another display. It means that cellphones should not require reentry of phone numbers, website locations should remain visible, and lengthy forms should be compacted to fit a single display.

Wonderful, wonderful stuff. When coming across this, I saw that Shneiderman taught at the University of Maryland . . . checking his CV, it turns out that he taught there back when I was a student. I could’ve taken his course!

It would be interesting to come up with similar sets of principles for statistical software, statistical graphics, etc. We do have 10 quick tips to improve your regression modeling, so that’s a start.

Progress in 2023

Published:

Unpublished:

Enjoy.

“It’s About Time” (my talk for the upcoming NY R conference)

I speak at Jared’s NYR conference every year (see here for some past talks). It’s always fun. Here’s the title/abstract for the talk I’ll be giving this year.

It’s About Time

Statistical processes occur in time, but this is often not accounted for in the methods we use and the models we fit. Examples include imbalance in causal inference, generalization from A/B tests even when there is balance, sequential analysis, adjustment for pre-treatment measurements, poll aggregation, spatial and network models, chess ratings, sports analytics, and the replication crisis in science. The point of this talk is to motivate you to include time as a factor in your statistical analyses. This may change how you think about many applied problems!

In judo, before you learn the cool moves, you first have to learn how to fall. Maybe we should be training researchers the same way: first learn how things can go wrong, and only when you get that lesson down do you learn the fancy stuff.

I want to follow up on a suggestion from a few years ago:

In judo, before you learn the cool moves, you first have to learn how to fall. Maybe we should be training researchers, journalists, and public relations professionals the same way. First learn about Judith Miller and Thomas Friedman, and only when you get that lesson down do you get to learn about Woodward and Bernstein.

Martha in comments modified my idea:

Yes! But I’m not convinced that “First learn about Judith Miller and Thomas Friedman, and only when you get that lesson down do you get to learn about Woodward and Bernstein” or otherwise learning about people is the way to go. What is needed is teaching that involves lots of critiquing (especially by other students), with the teacher providing guidance (e.g., criticize the work or the action, not the person; no name calling; etc.) so students learn to give and accept criticism as a normal part of learning and working.

I responded:

Yes, learning in school involves lots of failure, getting stuck on homeworks, getting the wrong answer on tests, or (in grad school) having your advisor gently tone down some of your wild research ideas. Or, in journalism school, I assume that students get lots of practice in calling people and getting hung up on.

So, yes, students get the experience of failure over and over. But the message we send, I think, is that once you’re a professional it’s just a series of successes.

Another commenter pointed to this inspiring story from psychology researchers Brian Nosek, Jeffrey Spies, and Matt Motyl, who ran an experiment, thought they had an exciting result, but, just to be sure, they tried a replication and found no effect. This is a great example of how to work and explore as a scientist.

Background

Scientific research is all about discovery of the unexpected: to do research, you need to be open to new possibilities, to design experiments to force anomalies, and to learn from them. The sweet spot for any researcher is at Cantor’s corner.

Buuuut . . . researchers are also notorious for being stubborn. In particular, here’s a pattern we see a lot:
– Research team publishes surprising result A based on some “p less than .05” empirical results.
– This publication gets positive attention and the researchers and others in their subfield follow up with open-ended “conceptual replications”: related studies that also attain the “p less than .05” threshold.
– Given the surprising nature of result A, it’s unsurprising that other researchers are skeptical of A. The more theoretically-minded skeptics, or agnostics, demonstrate statistical reasons why these seemingly statistically-significant results can’t be trusted. The more empirically-minded skeptics, or agnostics, run preregistered replications studies, which fail to replicate the original claim.
– At this point, the original researchers do not apply the time-reversal heuristic and conclude that their original study was flawed (forking paths and all that). Instead they double down, insist their original findings are correct, and they come up with lots of little explanations for why the replications aren’t relevant to evaluating their original claims. And they typically just ignore or brush aside the statistical reasons why their original study was too noisy to ever show what they thought they were finding.

I’ve conjectured that one reason scientists often handle criticism in such scientifically-unproductive ways is . . . the peer-review process, which goes like this:

As scientists, we put a lot of effort into writing articles, typically with collaborators: we work hard on each article, try to get everything right, then we submit to a journal.

What happens next? Sometimes the article is rejected outright, but, if not, we’ll get back some review reports which can have some sharp criticisms: What about X? Have you considered Y? Could Z be biasing your results? Did you consider papers U, V, and W?

The next step is to respond to the review reports, and typically this takes the form of, We considered X, and the result remained significant. Or, We added Y to the model, and the result was in the same direction, marginally significant, so the claim still holds. Or, We adjusted for Z and everything changed . . . hmmmm . . . we then also though about factors P, Q, and R. After including these, as well as Z, our finding still holds. And so on.

The point is: each of the remarks from the reviewers is potentially a sign that our paper is completely wrong, that everything we thought we found is just an artifact of the analysis, that maybe the effect even goes in the opposite direction! But that’s typically not how we take these remarks. Instead, almost invariably, we think of the reviewers’ comments as a set of hoops to jump through: We need to address all the criticisms in order to get the paper published. We think of the reviewers as our opponents, not our allies (except in the case of those reports that only make mild suggestions that don’t threaten our hypotheses).

When I think of the hundreds of papers I’ve published and the, I dunno, thousand or so review reports I’ve had to address in writing revisions, how often have I read a report and said, Hey, I was all wrong? Not very often. Never, maybe?

Where we’re at now

As scientists, we see serious criticism on a regular basis, and we’re trained to deal with it in a certain way: to respond while making minimal, ideally zero, changes to our scientific claims.

That’s what we do for a living; that’s what we’re trained to do. We think of every critical review report as a pain in the ass that we have to deal with, not as a potential sign that we screwed up.

So, given that training, it’s perhaps little surprise that when our work is scrutinized in post-publication review, we have the same attitude: the expectation that the critic is nitpicking, that we don’t have to change our fundamental claims at all, that if necessary we can do a few supplemental analyses and demonstrate the robustness of our findings to those carping critics.

How to get to a better place?

How can this situation be improved? I’m not sure. In some ways, things are getting better: the replication crisis has happened, and students and practitioners are generally aware that high-profile, well-accepted findings often do not replicate. In other ways, though, I fear we’re headed in the wrong direction: students are now expected to publish peer-reviewed papers throughout grad school, so right away they’re getting on the minimal-responses-to-criticism treadmill.

It’s not clear to me how to best teach people how to fall before they learn fancy judo moves in science.

Statistical Practice as Scientific Exploration (my talk on 4 Mar 2024 at the Royal Society conference on the promises and pitfalls of preregistration)

Here’s the conference announcement:

Discussion meeting organised by Dr Tom Hardwicke, Professor Marcus Munafò, Dr Sophia Crüwell, Professor Dorothy Bishop FRS FMedSci, Professor Eric-Jan Wagenmakers.

Serious concerns about research quality have provoked debate across scientific disciplines about the merits of preregistration — publicly declaring study plans before collecting or analysing data. This meeting will initiate an interdisciplinary dialogue exploring the epistemological and pragmatic dimensions of preregistration, identifying potential limits of application, and developing a practical agenda to guide future research and optimise implementation.

And here’s the title and abstract of my talk, which is scheduled for 14h10 on Mon 4 Mar 2024:

Statistical Practice as Scientific Exploration

Much has been written on the philosophy of statistics: How can noisy data, mediated by probabilistic models, inform our understanding of the world? Researchers when using and developing statistical methods can be seen to be acting as scientists, forming, evaluating, and elaborating provisional theories about the data and processes they are modelling. This perspective has the conceptual value of pointing toward ways that statistical theory can be expanded to incorporate aspects of workflow that were formerly tacit or informal aspects of good practice, and the practical value of motivating tools for improved statistical workflow.

I won’t really be talking about preregistration, in part because I’ve already said so much on that topic here on this blog; see for example here and various links at that post. Instead I’ll be talking about the statistical workflow, which is typically presented as a set of procedures applied to data but which I think is more like a process of scientific exploration and discovery. I addressed some of these ideas in this talk from a couple years ago. But, don’t worry, I’m sure I’ll have lots of new material. Not to mention all the other speakers at the conference.