7th graders trained to avoid Pizzagate-style data exploration—but is the training too rigid?

[cat picture]

Laura Kapitula writes:

I wanted to share a cute story that gave me a bit of hope.

My daughter who is in 7th grade was doing her science project. She had designed an experiment comparing lemon batteries to potato batteries, a 2×4 design with lemons or potatoes as one factor and number of fruits/vegetables as the other factor (1, 2, 3 or 4). She had to “preregister” her experiment with her teacher and had basically designed her experiment herself and done her analysis plan without any help from her statistician mother. Typical scientist not consulting the statistician until she was already collecting data.

She was running the experiment and after she had done all her batteries and replicates she hooked up 12 lemons and then she excitedly told me how high her voltage went up to with 12 lemons. I told her oh that is really interesting, you could do an exploratory analysis and plot the voltage for 1, 2, 3, 4, all the way up to 12 lemons, she said Mom that is not in my experiment plan. I can’t do that. I said just try it and write down the data so you can use it to guide future research and make sure you label it as exploratory. She was against the idea and said it was not in her plan she had submitted to her teacher so she would not put it on her poster for the science fair.

I thought you and the readers of your blog might find it interesting (and encouraging) that a 7th grader with the help of her teacher is more ethical with her research than some practicing scientists at big name schools and that kids her age were preregistering their designs at school. I also might write up her project idea for STEW (Statistics Education Web) because I thought it could be fun for other middle school, high school and intro stat teachers to do with their students and because I could include the story about preregistration and maybe get people thinking of this idea and how it can be included in an Intro Stat class. Also I love the idea of being a coauthor with my 7th grader.

I have mixed feelings on this. On one hand I like the idea that 7th graders are doing experiments at a higher quality than that of decorated Cornell University professors. On the other hand, I’m a bit disturbed that the child scientist was discouraged from plotting her data!

Overall the story is interesting as well as being cute as it illustrates the tension between different recommendations of good scientific practice. When it comes to what to do, I agree with Kapitula that it’s best to graph the data and label this step as exploratory; the challenge is how to teach this in a way that avoids training a whole fleet of Wansinks. Exploratory data analysis is a great idea; Wansink’s problem was that he and his colleagues were not exploring their data; they were just making up tables in order to get publications.

44 thoughts on “7th graders trained to avoid Pizzagate-style data exploration—but is the training too rigid?

  1. a 2×4 design with lemons or potatoes as one factor and number of fruits/vegetables as the other factor (1, 2, 3 or 4).

    after she had done all her batteries and replicates she hooked up 12 lemons and then she excitedly told me how high her voltage went up to with 12 lemons.

    The second experiment is far more useful from a scientific standpoint, it is amazing how people are naturally good scientists until told all their intuitions are wrong because good science is incompatible with NHST. What does the curve of # lemons vs voltage look like I wonder?

    • There are exploratory and confirmatory analyses.

      Certainly, for learning new things, exploratory analyses are very important. In that way, it may be quite logical to conclude that exploratory analyses are more valuable than confirmatory studies (at least in certain stages of the scientific process). And it’s been my experience that non-statistical researchers are unnecessarily afraid of exploratory analyses and need to be coaxed into them.

      But Wansink shows us that completely ignoring the fine points of confirmatory analyses isn’t a recipe for success either!

      I think one of the issues is that exploratory analyses are considered “too easy” to get enough attention in traditional stats classes. I would be (pleasantly) surprised if a 10-week Stats 101 course spent more than 2 weeks on exploratory analyses. Students then read not much attention = not much importance.

      I also think every confirmatory study should come with an exploratory study slapped on the back of it. Although on access to the data would also answer that.

      • That makes sense, my understanding is if you put batteries in series the total voltage is just arrived at by summing them up, ie for n batteries: V_Total = sum(V[1:n]) = n*V_avg

        • as long as there’s no significant current flowing. When there is current, the internal resistance reduces the voltage across the battery. Basically the Zn gives up electrons which flow through the circuit to the copper. Positive charged Zn ions then flow through the potato to join up with the copper, and electroplate it.

          As for the slope, this is probably due to the lemons having an acid that facilitates the chemical reaction between the metals whereas potatoes have a less acidic environment. The upper bound on voltage should be related to the type of the two metals through the tabulated electrode potential

          https://en.wikipedia.org/wiki/Standard_electrode_potential_%28data_page%29

          So the next thing to check is did the lemon clock and the potato clock use the same metals?

        • So you also add up the resistance, using Ohm’s law: V_Total = n*(V_avg – I*R_avg)

          Then it would still be linear and you could use this to estimate the resistance of each lemon/potato. I feel like you can’t just keep adding stringing together more produce without introducing some other kind of inefficiency though. It’s been awhile since I messed with electronics so maybe it would work…

  2. Thank goodness Darwin didn’t preregister exactly what observations he’d make before voyaging on the Beagle. As your correspondent would likely agree, one of course doesn’t want to draw conclusions from unconstrained or over-fit observations (forking paths, etc.), but one also doesn’t want to blind oneself to observation and learning unexpected things about the world. What is it about how we teach science, especially biology, that turns it into a mindless and formulaic exercise? Perhaps the extreme of total pre-registration is better than the extreme of p-hacking and overfitting, but both are worse than actually thinking about noise, repeating experiments following observations, and generally being critical. This isn’t hard, even for kids, and doesn’t even require any math. My 7 year old’s science project involved repeated observations of balls rolling along a Hot Wheels track to see if they could go around a loop, and he seemed to grasp the idea of variability pretty well by making a graph. (It was fun, too.)

  3. Don’t help your 7th grader!

    My daughter won her 7th grade science regional, then made it to state. I thought, “let’s add a line with an equation to this scatterplot”. But she got interrogated about the line and gave a Dadsplaining explanation, and got downgraded for it.

    This would have departed into a long-forgotten memory if my daughter wasn’t now a middle school science teacher :)

  4. So dumb analysis which you tell everybody about ahead of time is “good”. Smart analysis you do without telling everyone first is “bad”. Sounds like you super geniuses got this whole “scientific method” thing figured out.

        • I would still argue that using the word “dumb” is fairly juvenile and the original comment is rather snarky as opposed to helpful and very much misses the mark. No one here has ever said “dumb” analysis should be done, to me a nice benefit of preregistration is that it helps weed out bad ideas without major expense of data collection.

          I also think the idea of projects where you present your results regardless of what they are is a good one.

        • “…a nice benefit of preregistration is that it helps weed out bad ideas without major expense of data collection.”

          GS: Who decides, I wonder, whether or not something is a “bad idea”? Also…are you sure that science never advances by seat-of-the-pants changes in procedure contingent on the data coming in? In a lot of stuff, I guess, this doesn’t come up in many fields (mainstream psychology for example) because the opportunity to change course in an experiment, or making scientific progress by figuring out how to get experimental control of some phenomenon etc. since many fields don’t really exert precise control over their subject matter (mainstream psychology for example). I was sympathetic, at first, but now the whole preregistration kind of thing just seems based on a common, but simplistic vision of science. Seems to be quite a bit of that going around and this blog is certainly no exception.

        • I think you can preregister and still have changes along the way.. I am thinking of adaptive clinical trials for example.

          I also think that secondary data analysis can be useful but one should call it that when one is doing it and make conclusions cautiously.

      • Laura, for what it’s worth, there appears to be one “anonymous” poster here who enjoys being arrogant and rude. He (assuming it’s a he) rarely manages to say anything substantial.

  5. Yes, the training is too rigid. What a shame that 7th graders are learning that “ethical practice” means have no interesting ideas or thoughts once you start collecting data. Surely a way to chase people out of science.

    • I think you are reading this the wrong way. The whole thing is her original thoughts. She just had not originally planned to look at a string of 12. After she did her planned experiment she did additional work and did look at a string of 12 because she is curious and wanted to know what would happen. What she did not want to do is deviate from her experiment as she had originally designed it in her write up of that experiment.

  6. She did plot her data. Line plots of the means. She just stopped at 4 as she had originally planned. She kept the other data too. It just did not make the final poster for school. I told her she could make a plot with all 12 hooked up and have it to show people at her poster who come and ask what might happen as you add more lemons or potatoes.

    She did it all herself. She did not even ask for my help in how to make the graph. My kids like to do things their own way.

  7. “Cats Love Potatoes More Than You Ever Will”

    Apparently the writer has never encountered a finnish person. But I’d slightly (kind of) agree with mad_kalak; I’ve actually run into some problems when discussing the “pizzagate” since people obviously think of the political… umm… “thing” that happened. Although luckily not that many people seem to know about it (the political scandal), so if the term gets more popular in this context, mayhaps this becomes the default connotation…

      • Well, the kitties I’ve encountered have had a thing for potato peels. But on the other hand, I don’t think I’ve ever had potato pancakes… it’s all about carrots or spinach or blood (around here we are all pale from the lack of iron so we have to put blood in our food).

  8. The junior-year lab course is an important rite of passage for physics majors at MIT. When I took it (decades ago), one of the biggest lessons was to plot the data as you go along, in the standard graph-paper lab notebook. This process avoids taking a bunch of finely-spaced points where nothing much is happening, for example, so you can focus on more informative data. In addition, it often uncovers problems such as misbehaving equipment or human error when a particular point disagrees with the trend. If you are still taking data you can repeat measurements several times in that vicinity and confirm that the point is an outlier. If the discrepancy is very large, by repeating the measurements you can then be justified in ignoring the offending point, even if you can’t figure out what went wrong with it. Obviously this has to be done with care and scrupulous honesty, but it is an efficient way to find out what the data are really trying to say.

    I suppose the right approach is to treat that whole process as exploratory, then register the protocol and do it all over again. But although I understand the goal, I find it kind of sad that young students are being taught not to adapt to the evolving data at all, depriving them of the joy of discovery.

    • You have exactly described the natural science of behavior – now called “behavior analysis.” In such experiments, one measures behavior repeatedly often across several months or longer (usually daily experimental sessions) and plots these measures each day. If you walk into any behavior analytic lab, you can say “let me see the daily data for this experiment” and you will be handed a notebook (well – in the old days before computers) that contains the entire history of the experiment, (often with the effects clearly visible in changes in the stable-state to the right of the vertical line demarcating the phase change) and the return of the measures to original levels when the original conditions are reinstated. It is generally critical, as Don points out, to make procedural adjustments, or just to detect equipment malfunctions. That kind of stuff is the hallmark of experimental science. One quick story that is not all that uncommon: Someone in the lab was looking at lever-pressing in monkeys maintained under certain schedules of food delivery. The goal was, I think, to look at drug effects on behavior whose frequency had been decreased by imposing a delay between the lever-press that produced food and the delivery of the food. Even after substantial delays were imposed, response rates didn’t change. Now, this was unexpected since delays on the order of a few s result in profound disruption of responding in other species. I mean, this is a well-studied effect for the precise reason that the temporal relation between responding and some consequent event is a fundamental aspect of reinforcement (for you AI-following types, it is critical for the “credit-assignment problem”). Anyway…we looked at the distributions of inter-response times for the monkeys and these were very sharply peaked – the monkeys responded mostly at a constant rhythm. In rats and pigeons, however, the distribution of IRTs under the similar 0-delay conditions are multi-modal containing, importantly, a reasonable percentage of longish IRTs. Given this, when delays are arranged in these subjects, the temporal distance between a response and the delivery of food can be considerable once the delay is introduced. But, again, the monkeys responded at a reasonably high rate, emitting the same, rather short, class of IRTs – but there were a few long IRTs. So…we “added” long IRTs to the IRT distributions of the monkeys by differentially reinforcing responses that concluded a long IRT. Reinforcement was still 0-delay. When the long-IRT contingencies were withdrawn, there remained a heightened probability of long IRTs than originally prevailed. When the delay condition was reinstated, the variable exerted effects on the monkey’s response rates consonant with those seen in other species. So…before we even started the experiment proper, we had acquired data that speak to the generality of the delay-of-reinforcement effect. It is very general across species, though it can depend on certain characteristics of the pre-delay IRT distributions. That’s experimental control of one’s subject matter. That is why natural sciences steadily advance. Often, even before you put the actual experimental question to Nature, you improve your control over Nature on the way.

    • The joy of discovery still happened. She linked all 12 together and saw what happened. There is a lot of fun in just seeing what happens regardless of what you decided ahead of time to write up. That is the cool thing about kids they do science just to do it. It is also the fun thing about doing data analysis. We can look at all sorts of stuff but how we write things up and conclusions we make does require a strong moral code and if one preregisters there is in a sense our written plan can keep us honest. Then if we change things we can clearly say why.

      Also in seventh grade this is probably the first full experiment most of these kids have done, every kid in the class has to pick their own thing to study, design an experiment, do it, and write it up and do a simple data analysis. So for some kids especially sticking to their preapproved plan is rather important or they will get off track and not finish.

    • Don:

      I took Junior Lab and got nothing out of it. The experiments were just too damn complicated. It was all I could do to follow the instructions and get whatever results we were supposed to get. There was no joy of discovery, just a lot of pressure to get the experiments to work.

      2.70, though, that was different: there I learned a lot.

      • My grade 9 science teacher would always distract us so that the results were not as expected – even at the time I appreciated the lesson.

        That one needs to live up to or fall in law with expectations is a nasty side effect of education generally?
        (When I went through Bayes via Galton’s two stage quincunz with my 15 year old son, he said “I see whats happening and why you would do that – but shouldn’t there be a formula that does a better job?”)

        Also, I think in research you only get to see how good a researcher is when something goes wrong.
        (As Jim Till [discovered? stem cells https://ipscell.com/2012/04/who-really-discovered-stem-cells-the-history-you-need-to-know/%5D once explained – we did not know what the anomaly was but we did know not to ignore it.)

        • “I see whats happening and why you would do that – but shouldn’t there be a formula that does a better job?”

          GS: Assure him that the quincunx IS a computer!

      • Andrew, you remind me that some of the Junior Lab experiments were simply too ambitious, even with a dozen or so lab hours, so following the complex recipe was the only option. And of course there was always a right answer, which teaches the wrong lesson. But I did experience the joy of revelation as well, for example in seeing the relativistic increase in electron mass.

        I vicariously experienced the excitement and frustration of 2.70 through my roommate, so I share your enthusiasm for that process (which spawned the successful FIRST robotics program).

Leave a Reply

Your email address will not be published. Required fields are marked *