## How to get a sense of Type M and type S errors in neonatology, where trials are often very small? Try fake-data simulation!

Tim Disher read my paper with John Carlin, “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors,” and followed up with a question:

I am a doctoral student conducting research within the field of neonatology, where trials are often very small, and I have long suspected that many intervention effects are potentially inflated.

I am curious as to whether you have any thoughts as to how the methods you describe could be applied within the context of a meta-analysis. My initial thought was to do one of:

1. Approach the issue in a similar way to how minimum information size has been adapted for meta-analysis e.g. assess the risk of type S and M errors on the overall effect estimate as if it came from a single large trial.

2. Calculate the type S and M errors for each trial individually and use a simulation approach where trials are drawn from a mean effect adjusted for inflation % chance of opposite sign.

I think this is the first time we’ve fielded a question from a nurse, so lets go for it. My quick comment is that you should get the best understanding of what your statistical procedure is doing by simulating fake data. Start with a model of the world, a generative model with hyperparameters set to reasonable values; then simulate fake data; then apply whatever statistical procedure you think might be used, including any selection for statistical significance that might occur; then compare the estimates to the assumed true values. Then repeat that simulation a bunch of times; this should give you a sense of type M and type S errors and all sorts of things.

1. Anoneuoid says:

Start with a model of the world, a generative model with hyperparameters set to reasonable values; then simulate fake data; then apply whatever statistical procedure you think might be used

Likely they have no training to anything close to what you are asking of them. So this will mean learning on their own as they continue with whatever other responsibilities they have (at least that was my experience). The NHST cancer runs so deep… you waste so much time memorizing questionable stuff arrived at via flawed practices (along with a quick survey on NHST so you don’t think too hard about it) that there is none left to learn actual useful skills like running simulations.

• Andrew says:

Anon:

Likely they have no training in fake-data simulation because it’s not in the usual curriculum. But we’re putting it into Regression and Other Stories, so thousands of future students will have the chance to learn it in their classes!

• jrc says:

I’m sympathetic to that argument, and I remember the first time someone said to me “maybe you could simulate that?” and my first thought was “no way that sounds crazy hard.”

But then it wasn’t so hard. If you can do something once, it is pretty easy to do it 10,000 times and store the results. And I’d think that basic simulation techniques are something that a collaborator could bring to the table without needing too much understanding of the actual biology/physiology… the nurse can feed them the information on the parameter values from medicine, the statistician can estimate the remaining parameters and then can code it up in a DGP. Then press Go 10,000 times.

I really do love this idea. I think one of the hardest parts will be developing logical rules that simulate the appropriate kind of p-hacking. Is that gonna be part of the examples in the book? Specification searching, definitions of potential outcome variables, and GoFP-type reasoning related to series of significant/non-significant results? The first two seem easy enough (sorta), but quantifying the “we’d be able to fit this string of significant/non results into our under-specified theoretical model” would be hard to do purely based on simulation I’d think…

• Martha (Smith) says:

A little anecdotal information: I once had a student who was taking graduate (non-intro) stat courses who was (as was his wife also) a Ph.D. student in biology, but decided to pursue a graduate degree in nursing, thinking that it would make it easier for him and his wife to both find jobs in the same locality if they were in different fields. The university offered an “alternate entry” route to a graduate nursing degree, which he applied to. I don’t know what eventually happened, but the point is that it is possible for someone with a reasonably strong background in statistics to pursue a graduate nursing degree.

• Anoneuoid says:

a Ph.D. student in biology…a reasonably strong background in statistics

Well, my background is not in biology per se but rather biomed (molecular bio, cell bio, neuroscience, etc). I would not assume a PhD in those areas to have a reasonable background in statistics, quite the opposite actually. So my comment was not pointed at nursing at all, rather medical training in general. Stats was a total afterthought in my training, and from reading many thousands of papers this seems to be the norm worldwide.

• Martha (Smith) says:

I didn’t mean to imply that all Ph.D. students in biology had strong statistics backgrounds; I am well aware that many such students have very poor statistics backgrounds — I was just saying that this particular one did; your quote would have been a more accurate summary of what I said if it had said,

“a student who was taking graduate (non-intro) stat courses who was … a Ph.D. student in biology, but decided to pursue a graduate degree in nursing … the point is that it is possible for someone with a reasonably strong background in statistics to pursue a graduate nursing degree.”

2. Susan Henly says:

3. Keith O’Rourke says:

Two things – nurses in research and meta-analysis.

I once had a gig to provide about 12 hours of statistical training for a group of infection control nurses in Toronto in the mid 1990s. For some reason they all were using SAS, so I decided to lean heavily on simulation which they picked up on fairly quickly (even in SAS). So they were able to gain a good grasp of type one errors, power and confidence interval coverage by seeing it happen by doing a bunch of simulations. Also they were a rather smart group as they had been selected into the research area. Now the the disaster – “my boss has asked me to do sample size section in his grant and I am not sure he will be OK with the simulations I can do for that.” Unfortunately he was right, at the time simulations were just not acceptable to the grant reviewers. Because of that issue, some thought it was a waste of time. Susan points out the field of nursing research has evolved.

Meta-analysis has been used within the field of perinatalogy from the early 1980s. The big problems to deal with are selective reporting of studies and mis-reporting of what actually happened in the studies including lack of access to the raw data. If one solves that – has all the studies and all the raw data – assuming that overall there is high power – I don’t believe type M and S errors will be a big problem. And if one is interested in individual study effects they should partially pooled.

4. Tim Disher says:

Thanks for answering my question Andrew. For those interested I can confirm that simulation is not something I have ever been taught formally. Just like jrc mentioned, however, I have been finding myself gradually warming up to it and building the confidence to use it as a tool for a couple of papers I have in my pipe line. I did have to identify someone to provide statistical support to verify I’ve gone from concept to code correctly though.

In response to Keith: I think you’re spot on with your assessment for the most part, I just don’t have a lot of faith that you’re going to be able to get your hands on all of the primary data for most content areas. Even with all the data I think the assumption that overall there is high power will vary considerably by question and outcome so it seems hard to believe that type M errors at least aren’t an issue. Some of the new milk fortification trials come to mind.

• Keith O’Rourke says:

> faith that you’re going to be able to get your hands on all of the primary data for most content areas
That almost never happens, the main point being these garbage in problems make the consideration of other statistical issues rather academic – band aids for machine gun wounds.

By the way, here is a good example of simulating type S and M errors http://andrewgelman.com/2016/08/22/bayesian-inference-completely-solves-the-multiple-comparisons-problem/

Now there a selection step involved in defining and simulating type S and M errors – where do you see that happening in meta-analysis – their conduct, reporting or a fields selection of which meta-analyses to act on?

• Tim Disher says:

Great question, and one I’m not sure to answer. I came at this originally while working on a meta analysis of treatments for pain but ended up leaving the M and S error stuff out until I felt more confident in how they work in this context. With the GRADE approach, we evaluate precision partially based on whether the whole analysis has at least the number of participants as an adequately powered trial. Since this already involves a design analysis I thought it would make sense to use the type M and S function in the Gelman and Carlin paper to extend this to say something about the degree to which the estimate might be inflated in comparison to some reasonable expected value. In my quick and dirty solution, I did this based off the standard error of the pooled result.

• Keith O'Rourke says:

> whole analysis has at least the number of participants as an adequately powered trial
Simulating type one error and power and/or type M and S errors here is sensible. You will need to have a sensible/realistic model for the whole data generation and reporting process – individual studies, reported studies and meta-analyzed studies.

Now the Cochrane Collaboration as well as other groups do adopt and use less than ideal statistical methods and thinking, but given all the limitations of the actual data generation and reporting process (varying quality studies, selective reporting, mis-reporting, etc.) you will unlikely to beat them in a meaningful way (except occasionally?).

For instance, with regard to methods of analysis “From personal experience, there have been no practical differences in actual meta-analysis when both Classical direct likelihood [or Bayes with weak priors] and summary (two-stage) techniques were both used” O’Rourke and Altman 2005 http://onlinelibrary.wiley.com/doi/10.1002/sim.2115/epdf