GS: Just out of curiosity, are these biologists asking questions that are relevant to groups or to individuals?

]]>so if there is some natural variability in the process and they didn’t stop earlier, then the running average wasn’t yet into their high probability region of their prior… so we can infer that their high probability region of their prior might not include the region of the running average for the first several samples but it does include the region for the last several samples.

or similarly, maybe they have some knowledge of the measurement instrument, and they stop when the running average variability from the previous sample to the final sample is less than what they know the measurement bias might be… so the average has converged to about as good as its going to get given their knowledge of the possible sizes of the bias…. so I can infer what they think about the measurement bias.

Or similarly, the cost tradeoff between the value of improving the estimate of the population average and the cost of running more experiments became in favor of stopping… so I can infer some information about how much they are willing to spend on collecting data given how much they think they might get out of having the data… (this is very relevant to lawsuits)

Hey, that’s great, that’s exactly it!

]]>You’re right, there’s something confusing me and I didn’t know what it was. So I thought about it in the shower this morning, which is always a good idea. And here’s what I came up with (and again, thanks for pushing me because otherwise I wouldn’t have figured this out).

In all of this, there is a tendency on my part to try to map the “real” concern which is messy, to a simple description so you and I can discuss the simple description, but that’s imperfect and in this case, I didn’t even realize what my more real concern was. In this case, it seems to me that the real issue is that the real world cases involve me not knowing what the stopping model was *the stopping model is unknown*.

The real concerns are things like the biologist saying “it seemed like we were getting the right stuff so we stopped” or “some lawyers negotiated that they’d be willing to pay for 20 samples” or “the census bureau used the SuperEnsemble method (details totally hidden)” or “the preliminary study said X” or “we were getting basically the same things that other labs were getting” or whatever. Again, real world stopping rules are often messy.

Even if you tell me that N is a completely deterministic function of the data once the model was specified… I might want to know *what the heck was your model* and I might want to know this for various reasons:

1) it’s hard to elicit numerical priors from people like biologists or lawyers or engineers (populations I work with ;-)). So, when they didn’t actually use a numerical calculated model (like some Stan code) but rather something that we can think of as similar to a model (say a vast wealth of personal experience in this area) the fact that N is a pseudo-deterministic function of the data given the model may be irrelevant for our likelihood in model M2 that we make, but it still means that we can infer something about the model inside the head of the experimenter. Again mapping things imprecisely, we could infer something about what kind of prior the experimenter has and therefore what we should really be using in our formal M2 programmed into Stan.

2) Some of the projects I’ve worked on are legal settlement negotiations. Here, you’d like to know what the opposing party might be willing to settle for. The fact that they chose a particular N could give you information, such as “they’re really cheap, they don’t want to spend any money” or “they have deep pockets they’re willing to do plenty of investigation”. This might inform a prior on a parameter involving your estimation of what they will settle for.

3) When the stopping rule is unknown (The SuperEnsemble method) the stopping rule could be a deterministic function of the data, or it could be a random function of the data, you just don’t know. But you might have some sketchy details about the kinds of things that go into the stopping rule. That could then allow you to infer some stuff about the stopping rule. So for example you could do Bayesian model selection in your model, one where N is essentially deterministic, and one where N is a random function and dependent on some aspect of additional data/information related to your parameters.

Deep inside all of this has been a desire to help me figure out how to handle these sketchy cases, that are nevertheless pretty common when you’re working with small N variable outcomes, negotiations between parties, or experimenters with tons of background experience but no statistical modeling knowledge. Inventing parameters that describe the not totally known stopping rule and using them to help you figure out stuff in your own model is more like the real concern.

So, summary:

1) Data generating processes in the presence of stopping rules are different from when N is fixed because they handle the possibility that there might be various sized datasets etc… Nevertheless once you plug in N you will get the same likelihood in certain very common cases (called uninformative stopping rules).

2) If the stopping rule was a known deterministic function of the data N, or a nondeterministic function of the data N but not related to your parameters, then it is called uninformative and you can safely ignore it in your analysis.

3) even if the stopping rule, to a person who knows its details, is a deterministic function of the data N, if you don’t know what the stopping rule is precisely, to you it’s a random function and then if you want to know something about it, you can infer information *about it* from the N, what you infer may then inform your own model, and this is particularly useful in cases where the stopping rule is the kind of vague stuff like “it seemed ok” or “we negotiated for 20 samples” or “we used the SuperEnsemble method that our partners developed”. Note that in this case, you’re typically inventing parameters which describe the stopping rule, and so *the stopping rule is a random function related to your parameters*.

Hey! I think I do finally get it.

]]>> In the hot hand example, obviously conditional on the data M1 always gives the same answer. So it’s deterministic even within M2, the hot hand model, that is, within M2 we know that after seeing data D M1 will tell you to stop.

We agree, the stopping rule is a deterministic function of the data (the sequence of success/failure events [ x1 x2 x3 … ]).

> But…. the *fact that M1 told you to stop after N* is itself *data* within M2 which is relevant to your inference about the position and size of the jump…

The fact that we stopped at N (using a stopping rule which is depends only on the data and obviously does not depend on the parameters of M2 given the data) is completely irrelevant for inference about the position and size of the jump.

And I wouldn’t say that it takes a great amount of thinking to realise that the stopping rule is non-informative in this case.

> this may have to enter into M2 via a choice of prior for the change point which is dependent on N for example, an idea we have both agreed is useful somewhere above.

If you include the information about N in your prior, your likelihood will have to compensate for that somehow.

If in the end you get a posterior using your modified prior and your full analysis of the generating process that is not the exactly the same as the posterior that you would get analysing the sequence with the “N fixed” assumption and the original prior YOU ARE DOING IT WRONG.

I don’t really see the advantage in working more to arrive to a wrong conclusion (or in the best case the same conclusion).

]]>See that there is this rule tells you something about what the experimenters were thinking or how the experiment might be different if some unknown took on different values, and then you *choose to add a parameter to your model*.

This additional parameter, by itself, obviously alters the likelihood as now it’s a function over one (or several) extra parameters.

In the preliminary census study example, you might want to model the bias that the preliminary study had. So you put in a parameter for this bias.

Again, it seems like a mistake to ignore the modeling based on a “this rule is uninformative” type scenario, in part, you might need to add a parameter to *make* it informative.

]]>The bigger issue comes in when N is small, and that’s really common in stuff I work on. Biologists want to look at 3 or 5 or 25 animals, not 3500, and they first look at 3 or 4 of them to “try out their technique” without really recording those because they weren’t “really part of the study since we hadn’t really determined what surgery/drug/measurement technique we were really going to use”… and so those first 3 inform them that “stuff seems to be working” and then they collect 5 more with “a consistent technique” and then they want to know how “doing x” affects “outcome y” and then when you tell them that the posterior distribution isn’t all that concentrated, they say “well the original student graduated, but if I have my new grad student do 5 more will that be enough?”

And I guess my intuition is just “model all of it” and be ready to put some portion of the model, whether you classify it as likelihood or prior, that is potentially N dependent, or to add parameters that alter your likelihood function which express aspects of what caused you to choose the N or the experimental technique, or whatever.

]]>In the case “things seemed pretty consistent so we stopped” depending on my knowledge about what is going on I may say “that actually lets me know that X might be true and X is correlated with the thing of interest” the likelihood or prior then gets altered to account for X and its affect on the experimenter “deciding that things seemed pretty consistent”.

Even the bayesian test after each data point using model M1. This is deterministic given the data within M1, but it could be informative for model M2. If model M1 is that basketball shots are bernoulli random variables with constant p and the test is that the posterior probability p(success < 0.1 | M1) < 0.1, if my actual analysis model for this dataset is that p takes on one value initially and then there’s a change point where it jumps up when the player gets “hot”, the fact that they stopped the experiment is in some sense evidence that maybe the change point occurred already thereby biasing the inference for p under model M1 higher, and that’s obviously correlated with the parameters of interest, namely the change point and the size of the jump.

so, no, I don’t think you get away with saying “it’s uninformative so I avoid bothering to modeling stuff” because its “uninformativeness” is only with respect to whatever model you eventually decide you need.

]]>In most of the cases we have discussed, from textbook problems like “stop when you got three heads in a row” to real world examples like “stop if the result at an interim analysis is statistically significant” or “it seemed like we had a pretty consistent result, so we stopped collecting data”, you thought that more complex models were required because the one with N fixed was not representing the data generation process properly.

It seems useful to be able to use the simple model instead, knowing that it will result in the same likelihood function (and inference depends on data only through the likelihood). Of course that’s not true unless the dependence of the stopping rule on the parameters is only through the data, but that seems a reasonable assumption in many cases.

]]>I think my confusion was thinking that somehow knowing the definition of “informative stopping rule” would allow an analyst to short-circuit certain difficulties in modeling, as in “oh, in this case I’ve got an uninformative stopping rule, so I can pretend N was fixed”.

If “uninformative stopping rule” was a true property of the rule like “does or does not contain quantities of lead measurable with instrument X” is a real physical property of baby food… then you could first figure out what the “truth” was, and then use that truth to select your model.

But, it doesn’t work like that. It’s just a classification system for after-the fact. After you are in a position to determine whether the inference depends on the details of the stopping rule and/or choice of N, you can classify yourself into “stopping rule was informative” or “stopping rule wasn’t informative”. Useless before doing models, but after doing models you can see that what you did was basically one or the other.

I’m actually really happy with the idea that one way to deal with all of this is to look at the N and what you know about how it came about, and adjust your priors on parameters to explicitly account for what you think the N might tell you. That seems like a pragmatic useful rule of thumb, that it’s somehow legit to consider what caused the N in considering how to select the priors. I think that’s your 2nd and 3rd options.

This has been really helpful to hammer out a very subtle idea that is not usually addressed at this level in textbooks etc. My discussion of “textbook” problems is meant to refer to where someone poses the problem in such a way that the property “informative or uninformative” *really is* decidable before hand because there is no ambiguity in the model that might make something informative to one person and uninformative to another. There is an unambiguous shared comprehensive set of knowledge thanks to the extremely precise statement of the problem (ie. “flipping a perfect bernoulli coin with constant p”)

Once again Carlos, I really appreciate your persistence, even through what might have been some frustration. And I hope “crh” above and others get something out of this. In fact, I think I’ll put up a summary on my blog and point it at this thread.

]]>If a non-informative rule (fixed at the time when the experiment is designed) is based on prior data, the easiest way to proceed may be to incorporate that prior data into your prior distribution and use a model with N fixed. But of course you run into the risk of using that data twice in that case!

Is the data used to choose N completely included in the data you will analyse later? Then you can treat N as fixed without changing the prior.

Is the data used to choose N completely absent from the data you will analyse later? Then you can treat N as fixed if you include that information in the prior.

Is the data used to choose N partially included in the data you will analyse later? Then you could treat N as fixed but you have to “partially” modify the prior (difficult) or you could create the full generative model accounting for the partial overlap (not any less difficult, and I would say more difficult).

]]>p(CostToRepair[5] | N[5]) p(N[5])

because “the reason people chose N[5] is because they knew that the cost to repair things that had a visual score of 5 was probably pretty damn high”

so in fact, yes, it could be very reasonable to use N to adjust your priors on other internal parameters in some models and this is particularly true when you have a kind of adversarial ulterior motive perception of the study, like for example in a drug approval setting or a clinical trial setting with authors who have a demonstrated bias in their prior papers, etc etc if they’re choosing N or censoring data points, or whatever for an ulterior motive then even their “we excluded everyone with systolic BP > 150” or “we randomized the full recruited population 2:1 into treatment vs control” could tell you something even though it looks to be deterministic at the surface.

]]>I think my point is that whether the stopping rule is random or not, and whether it’s correlated with the parameters of interest or not is *not a property of the stopping rule*

it’s a property of your model of the data generating process.

But, the goal of a definition like “a stopping rule which is not random, or is random and not correlated with the parameter of interest is uninformative” is to give you a method for quickly deciding whether to ignore the stopping rule.

“we sampled 20 rooms by random number generator” sure sounds like “deterministic stopping rule N=20” but when you learn how the negotiations were done to arrive at the number 20 it sounds like “random number vaguely associated with cost to repair”

and so, the heuristic of “first determine whether the stopping rule is random and probabilistically related to the parameter of interest, and then if it isn’t ignore it” is really no help at all. it is really equivalent to:

“first create the full generative model include a generative model for N (which by the way is how we got onto this whole thread in the first place, Andrew talked about needing a model for N) and then analyze it by ignoring the definition of informative stopping rules because they won’t be relevant when you have the right generating process”

By the time you’ve done all this work including modeling how N came about… you’re going to get the right answer and you’re not going to be able to answer the question of “informative or not?” until you do all that work anyway.

]]>Several people survey all the rooms in a fire-damaged building rating the degree of damage in each one on a 0-5 scale with anchored descriptions of the meaning of the scale.

After this, based on cost considerations we have two options:

1) collect N data points with N determined by cost to investigate vs cost to repair selected randomly from among all rooms using a computer RNG.

2) subset the rooms into those rated 0,1,2,3,4,5 and select N_i from each with the total number determined by cost to repair vs cost to investigate, and the sub-samples chosen by computer RNG from among those rated in each state.

Now, client likes 2 better, and wants more samples in the 4,5 range because they want to nail down costs and these are the more variable and expensive conditions.

Now, first off, is the choice of N random or deterministic? The truth is, the choice of N comes from some people sitting around a spreadsheet and saying “what if cost of repair of a 5 is C5 and cost to repair of 4 is C4, but cost to survey is ….” and then they negotiate together with their client, and everyone eventually comes down to “survey 20 total rooms over 2 days with a team of X people and we’re authorizing $Y for the survey”

All I can say about that is that it is in fact based on some kind of cost model which is a kind of random variable that describes the interaction of all the different negotiations etc. And, by the way, it’s associated with cost to repair, and cost to repair is an important unknown parameter in the model.

In part 2, not only do we have choice of N total somehow nebulously chosen and associated with cost to repair, but also we have how many to sample in each subset associated with the ratings, and there were several people who did the ratings, and later on in the model the biases of each individual will become a parameter we are going to care about, so that we’ll adjust all the ratings to some abstract average rating across the several surveyors so the number of rooms chosen to be investigated from within each rating category is probabilistically dependent on the biases of the individual raters.

Of course, when this is written up it will be described like in the first case: “we chose 20 rooms by computer random number generator” and in the second case we could say “we chose 20 rooms by computer random number generator with 4 chosen from among those rated 0-3 and 8 chosen for rating 4 and 8 chosen for rating 5”

Person Q the counter-party who reads that description will say “there’s no stopping rule, N=20 is fixed”

but given the kind of information I’ve given you above, I certainly think it’s reasonable to call the choice of N random and dependent on the cost parameter.

So, your mileage may vary… a lot.

]]>> If we go back to “it looked pretty consistent so we stopped” is this not like “get a kind of gut instinct point estimate of some parameter, then make a random decision about whether to continue or stop based on what that point estimate was” ? which is sort of along the same lines as “randomly generate a value q* from the posterior, and then rbinom(1,q*) decides your stopping”

> This kind of thing is shockingly common, where a vague idea that the data looks ok based on some expectation of what you’re going to get leads you to randomly stop after you’ve fulfilled that expectation in some imprecise sense.

Is this a textbook example now? Do we agree that if the stopping rule is not correlated to the parameters in the model it won’t be informative? Why was that a very important problem yesterday and it’s not representative of the real world issues today? It would help with the discussion if you were able to stay with one of your own examples for more than 30 seconds.

In the “hidden” preliminary design study, the stopping rule is not informative because it’s fixed before the study and deterministic. I think the problem here is that your prior is not consistent with your knowledge if the N=3000 reflects some information that is not included in the prior. You could also think of the same study as beginning slightly before, with a stopping rule which is not fixed but established through a parallel survey. In that case it’s clear that the stopping rule is informative if that additional data is not included in the analysis.

> no one actually does textbook problems like “sample until you get 3 heads in a row” they all do things like “we first looked at XYZ and noticed that it had certain properties, and based on that, and cost considerations, we decided to sample N items, and after we did that we looked at the data and then we decided to sample K additional items in a certain sub-group… ” or whatever.

Again, is that stopping rule correlated to the parameters conditional on the data or not? (I hope the stars and capital letters are not required anymore)

If you can’t tell, you don’t know if the stopping rule is informative. Your solution to all of this is to always model the generating process as best as you understand it, ok. But if your understanding doesn’t allow you to decide whether the stopping rule is informative or not, I don’t see how it can be a solution.

> maybe I can determine if the stopping rule is dependent on the parameters of interest? Well, not until I have a model that tells me what the parameters are.

Do you find that surprising? You cannot do much hard core Bayesian analysis until you have a model either.

]]>In this case, even what looks like a fixed rule “sample until you get 3000 measurements” is in some sense a random and correlated with stuff you care about rule, because whether you choose 3000 or 3300 or 2700 or whatever is dependent on what information went into that preliminary analysis.

]]>What I still don’t know is whether any of that is helpful to me in applied problems where the stopping rule is vague, or may or may not involve unknown quantities from the population (like the “hidden” preliminary design study in the Census example above)

These are really the cases that I actually run into. no one actually does textbook problems like “sample until you get 3 heads in a row” they all do things like “we first looked at XYZ and noticed that it had certain properties, and based on that, and cost considerations, we decided to sample N items, and after we did that we looked at the data and then we decided to sample K additional items in a certain sub-group… ” or whatever. And in these cases I think the “metaphysics” winds up mattering because it alters your data generating process, and whether that results in the same likelihood as if you’d made some alternative assumption, isn’t as clear cut and guaranteed as the textbook type example.

BUT: YES please accept my thanks for helping to clarify how to analyze the textbook cases at least.

]]>…oh no, wait, I get it — the samurai scenario has negligible probability. Right, so that’s actually beside the point that I neglected to explain.

]]>To me, a stopping rule is deterministic, if, given all the information I have about it, I can assign 0 or 1 as the probability of stopping given the data, ie. p(Stop | Data, WhatIKnow) in {0,1}

There’s no other meaningful way to handle this in my Cox/Jaynes Bayesian conception, and so for some people they’ll call a rule deterministic and for some people they won’t, and it isn’t a function of the rule, it’s a function of what they know about the whole process.

So, the census goes out and surveys some region with 1000 census tracts. They publish a data set, and some notes on data collection. In the notes on data collection they say “we used the Proprietary SuperEnsemble(tm) method developed by our partners at Booz Allen Hamilton (motto: we haven’t been indicted… yet!) to stop sampling when we were virtually guaranteed to have sampled at least 10% of all people in this region”

Now, given what you and I know about the SuperEnsemble method, is it deterministic or “random”?

Imagine an alternative world, instead the census says “we did a preliminary study and performed a design calculation that determined that we should sample 3000 households” is this a “fixed N” or is it random? What if their preliminary study was to survey 1 census tract and calculate several sample statistics and do some mathematical calculations, and then round off, and the whole thing spit out 3000 ? Since we’ll assume the preliminary study wasn’t part of the final data set, then as far as I’m concerned, selection of this number 3000 was a random process that was dependent on the parameters of interest (because it was calculated from a sample from the population that isn’t part of the data). So even “sample 3000” can secretly be a random and parameter dependent rule.

Finally, assuming I can’t assign 0,1 to the stopping rule given the information I have, at least maybe I can determine if the stopping rule is dependent on the parameters of interest? Well, not until I have a model that tells me what the parameters are. Once I have that, then I need to do the best I can to guess about the stopping rule, and determine if given my knowledge, I might have assigned different probabilities of stopping if I were given the actual values of the parameters in my model than if I weren’t.

Of course, if you don’t take this cox/jaynes view, you might have a very different idea of what it means to be random vs deterministic.

My solution to all of this is to always model the generating process as best as I understand it. I don’t think I’ll go wrong with this. I’ll get the inference that corresponds to what I *really think* about the process. Whereas, if I repeat the mantra “stopping rules don’t matter” and I don’t do this full analysis of the generating process, then I’ll sometimes get the same thing that I would have, and sometimes I won’t.

]]>> the Bayesian analysis will, conditional on data, give you a deterministic value for the posterior distribution of the parameter and so if you stop when Pr(q = 0) < 10^-3 this is deterministic given the data, and the model. (if you do your analysis in Stan, there's always MCMC error too… but we can make this small).

Even if the Stan analysis is non-deterministic, this determinism will be irrelevant unless it's correlated with the parameter. (No need to tell me that it could happen, I can figure out an example myself: let's say we have a textbook coin-flipping example using a bimetallic coin, so the parameter theta changes with the temperature, and let's say the analysis is performed using a computer known to malfunction when the temperature increase, biasing the response).

]]>In principle, I don’t see the problem. Under model 1, is the stopping rule that was applied correlated with the parameters? If not, it’s not informative. Under model 2, is the stopping rule that was applied correlated with the parameters? If not, it’s not informative. Of course if you don’t know the answer to that question you won’t know whether it’s informative or not. But if you cannot get the answer to that question you won’t be able to write a more comprehensive model either…

> This seems to me to be very relevant to the design of stopping experiments. It’s safe to consider a Bayesian stopping rule as dependent only on the data *and the model* and not the parameter, provided you never ever want to use a different model to analyze this data. This is a kind of risk you take by deciding to use a Bayesian stopping rule. The stopping rule may make re-analyzing your data using a different model more complicated.

A stopping rule that depends only on the data *and the model*, as you say, depends only on the data. The model in the stopping rule is not a random variable, I think the *and the model* qualification makes no sense. If you want to analyse the data with a another model a stopping rule that was deterministic will still be deterministic, so it cannot be correlated to the parameters in the new model.

Of course if the stopping rule is not deterministic given the data it might be unrelated to the parameters in model 1 but be related to the parameters in model 2. Do your “10 random qs” depend on the parameters of model 2? (See the temperature-mediated example of correlation between the stopping rule and model parameters above).

> the q* value is correlated to q_real (that’s why we got the posterior in the first place)

The q* value is not correlated to q_real ****** CONDITIONAL ON THE DATA ******.

> If we go back to “it looked pretty consistent so we stopped” is this not like “get a kind of gut instinct point estimate of some parameter, then make a random decision about whether to continue or stop based on what that point estimate was” ? which is sort of along the same lines as “randomly generate a value q* from the posterior, and then rbinom(1,q*) decides your stopping”

Is that decision correlated to the model parameters ****** CONDITIONAL ON THE DATA ******? Otherwise, it’s non-informative.

]]>Bayesians love this one because you can make the stopping rule contingent on all sorts of things the experimenter might not actually be aware of. (For example, a samurai is hiding nearby and plans to swoop in and slice the coin in half with his sword on the second toss, but you stop on the first and he never gets a chance. Should this affect your inference?)

Not sure if I got your point but Bayes’ rule for N hypotheses labeled 0:N, given evidence E is:

P(H[0]|E) = P(H[0])*P(E|H[0])/sum( P(H[0:N])*P(E|H[0:N]) )

All hypotheses with relatively low P(H[i])*P(E|H[i]) can be dropped from the denominator as an approximation. It is just like x ~ x + 1/inf, which everyone accepts.

Anyway this is all fiddling while Rome burns as long as most people are testing “some other hypothesis” rather than their hypothesis.

]]>Carlos: Here’s where I think we can both agree. We agree that in the presence of a stopping rule, the data generating process is different (this is what you said about how the likelihood as a function of abstract symbolic quantities is different, it has to be capable of handling different length vectors etc). After we see data and plug it in, it may or may not be the case that a model which assumes a fixed N equal to the observed N, and a model which models the data generating process directly will arrive at the same likelihood function over the parameters. In many cases, it will. but *you won’t go wrong by modeling the data generating process* whereas under informative stopping rules you *will* go wrong by ignoring the stopping rule and assuming fixed N.

Is that all fair to say? I think this much we agree on. I admit to confusion about how to evaluate whether a stopping rule is informative by the “random and probabilistically dependent on the parameter conditional on the data” definition because I’ve hyper-internalized the cox/jaynes conception of all probabilities being conditional on some background information, and so without specifying what background information we are using, I don’t have a good idea what it means for a stopping rule to be “random”, hence things like the PRNG seed suddenly becoming relevant to the question or whether a rule can be considered informative under one model and uninformative under another model that has different parameters. In many of these textbook examples, the exactness of the data generating process is assumed, ie “flip a coin that has a constant p” is *really true* about the world, whereas in the “here’s some data, here’s what we did, can you model it” scenario, there are competing models and all of them are known to be not literally true. The parameter is not a feature of the world, it’s a feature in our head that helps us explain the world.

]]>ththhh

your likelihood is actually based on the sequence

ththhhh

which is the same as 4 in a row deterministic stopping.

which just goes to show that in fact the hhh + flip stopping rule… is informative

I’m still not sure how to deal with the idea of analyzing a dataset under model2 which had a stopping rule based on model1 and random samples from the model1 posterior.

good times though.

]]>See I like this example a lot, because if we assume that in essentially all real world cases, we’re estimating probabilities via sampling, then a rule like “if p(q<0.1) < 0.1 stop” which is deterministic in the case of perfect symbolic calculation, is random without knowledge of the RNG seed in the case of real world calculation (but deterministic if you know the seed).

If you later want to analyze this under some more realistic model involving more factors you’re in a weird position, you’ll be asking:

“what is the probability under model 2 that when you calculate probability of q under model 1 using Stan, you will stop, and is this independent of the value of all the parameters *in model 2*”

How about this for a stopping rule:

Flip your coin, if you get 3 head in a row, flip your coin again and if it’s a head stop.

clearly that final flip is like rbinom(1,p) and the definition says if the decision rule is random given the data and dependent on p then it matters… but this decision rule is random given the data (up to the 3 heads in a row) and dependent on p, but produces exactly the same decisions as “if you get 4 heads in a row stop” which is not.

Weird.

]]>Now, let’s go back to my example where you do a bayesian pre-analysis and then if you can reject q=0 for some level of probability you stop. You might argue that as you get your data, and do your bayesian pre-analysis before deciding to stop, the Bayesian analysis will, conditional on data, give you a deterministic value for the posterior distribution of the parameter and so if you stop when Pr(q = 0) < 10^-3 this is deterministic given the data, and the model. (if you do your analysis in Stan, there's always MCMC error too… but we can make this small).

But after you've stopped, and a few weeks later you discover there was some issue you hadn't considered, say like a drifting measurement error in some instrument, or a consistent bias in the polling methods that favored Hillary, or whatever, then

p(Stop|Data_so_far,Model2,additionalParameter2) does vary depending on the additionalParameter2 (say the size of the bias correction)

Is the argument then that the rule "calculate p(q| Data_so_far,Model1) and stop when it's small or you've sampled at least N" is deterministic given the data and so it's not under Model2 of any relevance anymore?

How about if you modify the rule to be "generate 10 random qs from the posterior under Model1 and if none are less than 0.1 stop"

This seems to me to be very relevant to the design of stopping experiments. It's safe to consider a Bayesian stopping rule as dependent only on the data *and the model* and not the parameter, provided you never ever want to use a different model to analyze this data. This is a kind of risk you take by deciding to use a Bayesian stopping rule. The stopping rule may make re-analyzing your data using a different model more complicated.

This case where we're assessing the posterior using simulation seems interesting. The posterior density for q is deterministic given the data, but the q* value itself isn't, and clearly the q* value is correlated to q_real (that's why we got the posterior in the first place)

If we go back to "it looked pretty consistent so we stopped" is this not like "get a kind of gut instinct point estimate of some parameter, then make a random decision about whether to continue or stop based on what that point estimate was" ? which is sort of along the same lines as "randomly generate a value q* from the posterior, and then rbinom(1,q*) decides your stopping"

This kind of thing is shockingly common, where a vague idea that the data looks ok based on some expectation of what you're going to get leads you to randomly stop after you've fulfilled that expectation in some imprecise sense.

]]>+1

]]>If it happens to be random, the second condition for the stopping rule to be informative is that it has to be correlated with the parameters of interest. If the parameter of interest is theta, stopping depending on a random rbinom(1, 0.5) will not be informative. Stopping depending on the output of rbinom(1, theta) would be informative.

]]>But to be honest, I tend to work in scenarios like where one of my friends comes to me and they’ve got a really complicated biological experiment that’s been carried out over 2.5 years by 4 or 5 different people, and by the time I’m done talking to them about the situation involving how they first started measuring things with instrument 1 and then the guy who ran that instrument quit, and so they moved to instrument 2, and then they couldn’t source reagent q anymore so the second half of the experiments were done with reagent z and blablabla, and deep into a half hour discussion of how the experiment was run they tell me things like “it seemed like we had a pretty consistent result, so we stopped collecting data”

it’s not so convincing to me to just repeat the mantra “stopping rules don’t matter” and move on.

And that kind of thing has been basically bread and butter over the last few years. I have these conversations multiple times a year.

“it seemed pretty consistent so we stopped” is that a random stopping rule or not? Is it dependent on the parameters of interest or not?

]]>As for the definition of the stopping rule: see I have difficulty with the term Random. I think lots of people go along with the word Random without thinking too much about it. Let’s use for the definition of random that it does not have probability 1 or 0. But we’re Bayesian, so we have to ask at what point in time and for whom or more generally, conditional on what knowledge?

Our stopping rule is “stop after 3 heads in a row”.

I am about to start flipping coins, what is the probability I will stop after the 3rd as of this moment calculated by me? We both agree there is zero probability for me to stop after 1 or 2. What knowledge is this conditional on? Assuming I know p = 1/2 I could say the probability I will stop after the 3rd is 1/2^3 but if I don’t know p then all I can say is 1/p^3 for some unknown p and I can put a prior on it … etc. So, at the start of the whole thing the stopping rule is random and dependent on p.

Now I get h,h,t

what is the probability I will stop after 3 at this point in time? the probability is zero at this point, because I know I don’t have h,h,h

Let’s change the rule to illustrate further confusion: after getting 3 heads in a row, ask Joe if you should stop, and if he says yes, stop.

I now get h,h,h what is the probability I will stop as calculated by you? what is the probability I will stop as calculated by Joe?

Or if you like, after 3 h values, call rbinom(1,0.5) and stop if it’s 1, what is the probability of stopping if you know the RNG seed, what is the probability of stopping if you don’t know the seed?

So, to me, “random” is extremely problematic. It’s this feature of the definition that I find confusing.

So let’s go back to the definition, when I read “random” I assumed it meant “not determinable at the *start* of the experiment” whereas you are reading it as “not determinable at the point in time when the stopping decision is made” and it’s clear that there are rules such as the RNG seed or Joe example where for some people, the rule is random, and for some people the rule isn’t random…

I have problems with understanding that.

]]>http://pcl.missouri.edu/sites/default/files/Rouder-PBR-2014.pdf ]]>

It’s still useful to understand simple textbook models. It’s useful to understand why the standard position of bayesians about stopping rules is “they don’t matter” (if you remain unconvinced, see for example http://www.ejwagenmakers.com/2007/StoppingRuleAppendix.pdf ). You will still be able to invent contrived examples where this is not true, and you will know exactly how contrived they need to be!

As Corey said, stopping rules are a favorite example of both frequentists and Bayesians to show how the methods of the other camp lead to absurd conclusions.

]]>A stopping rule, = “stop after three heads”

given data, = “+ + – – – + + – + + +”

is informative relative to parameters of interest if it is random and statistically dependent on those parameters. = “the rule says STOP, which is not random (let alone dependent on the parameter)”

Given data ” [+ – +] [+ – -] [- – -] [- + +] [- + -] [+ + -] [+ + +] [- + -] [+ + +] [+ + +] [+ + +] ” the stopping rule

]]>Also, question for you regarding interpretation, since maybe you have some more experience with this question than I do. When in the UC Berkeley paper they say: “A stopping rule, given data, is Informative relative to parameters of interest if it is random and statistically dependent on those parameters.”

So, we collect some data and each time we do an analysis to get a posterior distribution of parameter p, adding one more data point will produce a “random” new posterior. As soon as that posterior “rules out” p = 0 sufficiently, we stop, so probabilistically speaking it certainly seems that the stopping rule is dependent on p, in the sense that Pr(N = n | p) is a function of p.

I have problems translating all of this into the language of Cox/Bayes because the Berkeley original seems to be pretty firmly in the Probability = Frequency interpretation. But I think you can potentially rewrite it as something like “partial knowledge of the parameter of interest alters your knowledge of the stopping point”

But, then, that would seem to apply to the “stop after 3 heads” case as well. If p is very small, then you expect to flip many times, and if p is large you expect to flip few times. But I think you’ve convinced me that this results in the same likelihood. Is that just an accident of the bernoulli/negative binomial symmetry? Possibly. Or possibly we need a more carefully constructed definition of “informative stopping rule”

]]>I think in the real world, people who don’t have a pre-determined sample size tend to stop their sampling for all kinds of “real world” reasons like they got tired, they ran out of the reagent, the person they hired to do the work quit for a better job, the person they hired to do the work got sick, the people who were recruited to the study found it extremely uncomfortable, blablabla and often there is some feedback on the outcome… you run out of reagent because the experimenter was sloppy and tended to spill it, the worker quit for a better job because they hated the boss, and because they spent so much time looking for a new job, they did poorer quality work, the recruits found it uncomfortable because the side-effects were pretty severe, and thus there was really no blinding, and there was differential drop-out in the control vs experiment group… etc etc. In the real world when you see a design where N is not pre-determined you should be looking for the causes of the N, because usually it’s not “sample until you get 3 in a row” it’s “do some stuff, and then something happens, and then decide to cut the study short” and the reason why you cut the study short is often informative about what was going on.

]]>For extra fun times: even if it’s true that par=0, if we sample long enough we can put 0 arbitrarily far into the tails of the posterior distribution. This is called “sampling to a foregone conclusion”; the theorem is the law of the iterated logarithm.

]]>For the second scenario, is the stopping rule based just on the data? This is what it seems to me: parameters -> data -> interim analysis -> stopping decission. In that case, how is the stopping rule dependent on the parameters conditional on the data?

]]>