Ok, but dealing with office politics is totally different from a community telling you to literally produce BS or leave (my paraphrase of how you described research earlier). Ie, the BS *is* the job, not tangential to it. Also, consider that most people who go into research start out actually liking science and against destroying it.

There must be some more motivation I am missing.

]]>One of the things I enjoy about your work and blog is how straightforward you are about this stuff. OTOH your standards are pretty high so you might’ve reached the point of diminishing returns on raising them.

]]>Don’t expect to be believed. Present your results as if skepticism, not credulity, is the natural posture of your reader.

The curious thing about liars is that they expect you to believe them.

]]>Sad to hear, but I believe you, since you are in the field and I am not.

]]>To counter this tendency, may need to allow people to describe the limitations of a study. Other times, it means we should ask people to keep working on a topic for a few more years before publishing a really good study. The latter approach is hard to follow in the current academic environment, though. Search committees pay attention to the number of publications, and they have a difficult time judging the quality of the work.

]]>https://www.youtube.com/watch?v=wvVPdyYeaQU

Could almost be our very own Nick Brown ….

]]>One thing I am wondering is why people who recognize all the BS going on, and know that they are partaking in the BS, still want to be part of these research communities that demand it? It isn’t a job that necessarily pays well or anything, it has to be a constant stress to scientifically minded people to be knowingly producing BS… What is the motivation to stay in the community?

]]>Personally I think one of the ways to move in that direction is to create real early-career non-faculty positions rather than pretending “post-docs” are training positions.

]]>Indeed, life would be pretty boring if all my own work lived up to my standards. If this were ever the case, I’d take it as a signal to raise my standards!

]]>(I’m also talking about psychology) ]]>

I think we should proceed on two tracks:

1. Reduce incentives to do bad science. Methods of achieving goal 1 include setting up norms of preregistration, estimation of type M and type S errors, and Bayesian or hierarchical models to pool noisy estimates toward 0. P-value thresholds have traditionally been considered as a method to reduce incentives to bad science! They haven’t worked so well (forking paths and all that), but we should remember that as one of the goals.

2. Help scientists do better. Methods of achieving goal 2 include better measurement and better connections of theory to measurement, larger sample sizes and integration of data from multiple sources, within-person designs, and multilevel models.

We need both.

]]>– “playing the game” is not even considered to be “bad” by a lot of people, although it seems like the view about this is changing.

– many institutions back their scientists

– many colleagues back their “friends”

– those that are being “caught playing the game” (i.e. the Wansink or Cuddy case?) are mostly senior researchers with tenure and/or influence who will experience no real negative consequences. They may state that they “have lost grants” or something like that, but this doesn’t really matter i reason. They still have tenure or book-deals, and a nice paycheck to go with that.

– when all else fails, they could start talking about how everything is “context-dependend”, and that “more research is needed”, and stuff like that.

So in my reasoning, a) there are no clear rules of what is bad about “playing the game”, and “dubious” previous actions can all be accounted for by providing some (pseudo?) scientific reason, and b) those who “get caught playing the game” are being protected by the current system (i.e. tenure, influence of institutions and senior colleagues, people are given credit/taken seriously/listened to not because of what they say or do, but just because they are a professor) and suffer no real negative consequences.

]]>“Ten Simple Rules for Effective Statistical Practice”

Robert E. Kass, Brian S. Caffo, Marie Davidian, Xiao-Li Meng, Bin Yu, Nancy Reid

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961

Their proposed rules (elaborated for 1-2 paragraphs each in the article) are:

“Rule 1: Statistical Methods Should Enable Data to Answer Scientific Questions

Rule 2: Signals Always Come with Noise

Rule 3: Plan Ahead, Really Ahead

Rule 4: Worry about Data Quality

Rule 5: Statistical Analysis Is More Than a Set of Computations

Rule 6: Keep it Simple

Rule 7: Provide Assessments of Variability

Rule 8: Check Your Assumptions

Rule 9: When Possible, Replicate!

Rule 10: Make Your Analysis Reproducible”

I agree with what you’ve said about many Frequentist methods being inefficient when it comes to multiple comparisons and that standard Bayesian methods often provide tighter inference in the face of multiple comparison, especially when one can borrow information about related effect sizes. In fact, standard MLE estimates look positively worthless if one has model with a large number of nuisance parameters for which we already have some insight into their effect size.

But I’ll also note that regularization methods, such as LASSO + ridge regression, are completely justified in the Frequentist framework, so regularization is not purely owned by the Bayesian camp. One of the nicest things about looking at it from the Bayesian perspective is that it gives us real insight about how much to penalize; the LASSO prior looks a little silly if you have good understanding of all your variables.

]]>The issue with forking paths is not that there’s no correct way to account for this in a frequentist setting (there are), but rather that in practice, researchers don’t account for this, and very typically don’t realize they are making a mistake.

My point isn’t that Bayesian analysis *can’t* address this issue, but rather I don’t think the odds that the typical researcher using Bayesian methods is anymore likely to properly address it, especially as we push more Bayesian users who have a weaker background. To illustrate:

“The Bayesian version of this is to provide weights over all the transformations you want to consider… and then come up with posterior weights. “

But you do know of anyone who does this? I would love to see published literature that discusses such a prior, but truth be told, I doubt I will. More over, exactly like the p-value argument, it should be over all possible transformations you would ever consider…even if you really stopped at the first model you hypothesized!

So Bayesian methods provide a solution…that virtually no one will ever use. I would argue this is even worse, because (a) people think that they don’t have worry about multiple comparisons since they have used Bayesian methods (but don’t realize to really properly control for this, they would need to write a prior that accounted for any possible way they would ever look at the data, every possible way they would decide that a data point is corrupted and should be dropped, etc.) and (b) they will have even higher overconfidence in their findings because their inappropriate prior probably lead to a tighter credible interval than the corresponding confidence interval.

]]>Thanks in advance. Regrds, Sameera ]]>

“My point is that Frequentist techniques *do* provide valid methods of inference from the issues of forking paths (i.e. you could use an alpha spending function where you spend 0.025 on the first hypothesis, 0.0125 on the second, etc.)…but no one uses them (a) because they don’t realize they have to or (b) because they don’t want to due to weakening their inference. Unless anyone has any evidence to the contrary, switching to Bayesian inference doesn’t answer either (a) or (b).”

Yes, to the extent that there are “valid” frequentist techniques for dealing with multiple inference. However, 1) Bayesian or other methods using shrinkage estimates are also valid ways of dealing with multiple inference, and have the advantage that they generally give tighter estimates than frequentest methods.

2) The problem of forking paths can also occur without intentional multiple inference, by letting the data influence the choice of analysis.

I do agree that there is a lot of ignorance of the problem (and of methods for dealing with it), and that reluctance to use methods that might give weaker “inference” than less valid methods is also a big problem.

]]>In social psychology or cell biology where you can simply replicate the experiment, then, sure, I can see that it could be reasonable to require a preregistered replication.

]]>) but not the comment I replied to originally (October 10, 2017 at 4:03 pm). I don’t disagree with your second comment. ]]>

In a very real sense at a fundamental level these are equivalent problems.

Let TR(data) be a computational function implemented in a formal language, with length less than 10^6 bytes for example. A “forking paths” analysis is one in which instead of a single well-theoretically founded transformation, we have a nontrivial set of these tranformations from which to choose, and we choose one after considering several.

The Bayesian version of this is to provide weights over all the transformations you want to consider… and then come up with posterior weights. So long as the posterior weight is near 1 on the one you finally choose then the “choose the one you like” version is equivalent to the full Bayesian analysis. But when the data doesn’t strongly pick-out exactly one of the transforms, truncating the others out of the fit is a bad approximation to the posterior… it’s gaming the system in precisely the way that p-hacking games the system.

Now, although in theory we can make a correspondence between TR(data) in the space of computational formal language programs, actually doing this kind of thing in practice involves coding some Stan program for example with a flexible representation of the transformation parameterized by several parameters of interest.

]]>I do understand the difficulty people have accepting the problem with NHST* though. It was only after multiple years searching for a reasonable argument in its favor that I finally accepted there is nothing of value there.

The difficulty is in accepting so many highly educated people (often including themselves) have been wasting their careers like this, not with understanding the actual issues. Those I (at least loosely) grasped immediately upon being taught NHST, leading me to think I may have been going insane… until I discovered Paul Meehl, who so clearly explained the problem.

*Usual Disclaimer: This refers to the most common use case where the “null” hypothesis is not predicted by any theory. When the “null” hypothesis is predicted by theory, that is a completely different procedure which needs to be considered separately.

]]>Oops, with the comparison of subgroups example, that WAS an example where coming up with the Bayesian prior was actually fairly straightforward…but at the same time, that’s also an example where having an extremely low p-value means the posterior probability that the sign of your estimate is right is very high (unless you really believe there is a high probability of the true null), so it falls into Gelman’s rule of “Why we (usually) don’t have to worry about multiple comparisons”.

Yes, I know having high certainty that you have the sign right is not particularly important…but I think *that* this is the correct argument about why letting p-values decide publication is really stupid.

]]>My point is that Frequentist techniques *do* provide valid methods of inference from the issues of forking paths (i.e. you could use an alpha spending function where you spend 0.025 on the first hypothesis, 0.0125 on the second, etc.)…but no one uses them (a) because they don’t realize they have to or (b) because they don’t want to due to weakening their inference. Unless anyone has any evidence to the contrary, switching to Bayesian inference doesn’t answer either (a) or (b).

]]>This is a discussion about forking paths, not about how to fit non-linear functions. I used transformations of the data as an easy example, but it could have just as easily been using different exclusion criteria, comparing different subgroups, etc.

]]>The problem is we fetishize results like “handing out toothpaste before Halloween at public schools reduces cavities by age 12 p < 0.01”

whereas handing out toothpaste before Halloween at public schools alters cavities per person per year by between -0.002 and +0.004 is considered “failure to discover anything significant”

umm…. ಠ_ಠ

]]>