So before I am accused of being sexist, women can repeat above experiment. Just toss a coin to decide whether to wear red or not, and then tabulate the number of (fooled) men making a move. If you are very attractive you’ll have a lot of power even if the sample size is small.

]]>Identify a women wearing red, and those not wearing red.

Toss a coin and, depending on the outcome, try to seduce the woman wearing red or not.

Repeat. You may want to control for the rounds of drinks.

According to the theory you may have more success with those wearing red.

Notes:

1. The more attractive you are the less power you’ll need.

2. Ladies: Apologies if this comment sound sexist but had to time to word smith better.

p<-matrix(nrow=10000, ncol=2)

eff<-matrix(nrow=10000, ncol=2)

for(i in 1:10000){

#Without error

x<-rnorm(10,0,1)

y<-rnorm(10,.1,1)

eff[i,1]<-mean(x)-mean(y)

p[i,1]<-t.test(x,y, var.equal=T)$p.value

#With error

x<-x+rnorm(10,0,2)

y<-y+rnorm(10,0,2)

eff[i,2]<-mean(x)-mean(y)

p[i,2]<-t.test(x,y, var.equal=T)$p.value

}

length(which(p[,1]<0.05))/nrow(p) # Significant w/o error

length(which(p[,2]<0.05))/nrow(p) # Significant w/ error

par(mfrow=c(2,1))

hist(eff[which(p[,1]<0.05),1], xlab="Mean(x)-Mean(y)", main="Sig Results No Measurement Error")

hist(eff[which(p[,1]<0.05),2], xlab="Mean(x)-Mean(y)", main="Sig Results With Measurement Error")

It is often as childish as choosing the narrowest arrow bars made available by your graph software (which will be SEM). There has been some research concluding that people do not really bother to make the distinction anyway.

“We conclude that very many researchers whose articles

have appeared in leading journals in psychology, behavioral

neuroscience, and medicine have fundamental and severe

misconceptions about how CIs and SE bars can justifiably

be used to support inferences from data.”

http://psycnet.apa.org.proxy.uchicago.edu/journals/met/10/4/389.pdf

Their data shows 51 women among 100 surveyed as ovulating (i.e. high conception risk)? My naive calculation says 32% would be the expected number. Isn’t 51% a bit too high? Or is that variability to be expected?

http://ubc-emotionlab.ca/wp-content/files_mf/bealltracyinpresspsychsci.pdf

]]>The standard advice is that days 10-17 are most fertile, so based on day alone, a 1-day error is not such a big deal. But really the whole thing is hopeless. When studying effects that are so tiny and so variable, you need much more precision in every way, and it’s a fatal mistake to try study this via a between-person design. On the other hand, if a researcher just wants to come up with an endless string of statistically significant findings that can be taken to confirm a vaguely specified theory, the design they’re using seems to work just fine.

]]>In the context of this study even a one-day error would be huge, right?

]]>Ah! I see what you mean (I think).

Thanks!

]]>Your intuition would be correct here if you had observed the correlation between these measurements in the population. The problem is that, in the study in question, the researchers observed the correlations in a small sample. The presence of large measurement error makes it less likely that these correlations in the sample correspond to anything relevant in the population.

]]>Isn’t what you are describing similar? If so, why are they wrong?

]]>Yes, they argued that but they are mistaken in the implications of this fact. Eric Loken and I are writing a paper about this. The short answer is that misclassifications lower the expected effect size which in turn makes it less likely that any comparison that happens to be statistically significant is actually telling us anything about reality in the larger population. In short, the probabilities of Type S and Type M errors become very high. Their problem is that they are taking statistical significance as a signal to take their effect as true.

To put it another way, they found that in their data using their definitions that women in those day-of-cycle categories were three times more likely to wear red or pink shirts. Do they or anyone else believe that if they’d defined the day-of-ovulation categories correctly, that they’d have found even larger effects? No. What they are doing is capitalizing on chance, and each source of measurement error just disconnects that chance one more step from the underlying phenomenon of interest.

]]>My language is as malleable as my data.

]]>http://www.wired.com/2014/03/quanta-freeman-dyson-qa/

Interviewer: You became a professor at Cornell without ever having received a Ph.D. You seem almost proud of that fact.

Dyson: Oh, yes. I’m very proud of not having a Ph.D. I think the Ph.D. system is an abomination. It was invented as a system for educating German professors in the 19th century, and it works well under those conditions. It’s good for a very small number of people who are going to spend their lives being professors. But it has become now a kind of union card that you have to have in order to have a job, whether it’s being a professor or other things, and it’s quite inappropriate for that. It forces people to waste years and years of their lives sort of pretending to do research for which they’re not at all well-suited. In the end, they have this piece of paper which says they’re qualified, but it really doesn’t mean anything. The Ph.D. takes far too long and discourages women from becoming scientists, which I consider a great tragedy. So I have opposed it all my life without any success at all.

I was lucky because I got educated in World War II and everything was screwed up so that I could get through without a Ph.D. and finish up as a professor. Now that’s quite impossible. So, I’m very proud that I don’t have a Ph.D. and I raised six children and none of them has a Ph.D., so that’s my contribution.

]]>What I find strange about the graph is that the error bars represent the std error of the mean, if you read the fine print. Is this common practice in this field? If I see error bars I immediately think it is some kind of 95% interval. If their approach is not standard in the field, this opens up a nice new avenue for lying with statistics, or statistical graphics anyway.

]]>You are being pedantic.

My prior that A is true is 1.

The likelihood that A is true is 1. Why? Because I know how to torture data such that it confesses my priors. Ergo A is true bc I say so.

Mental onanism about NHST and Popperian stuff won’t get you anywhere. Start torturing the data.

;-)

]]>“Furthermore, if our categorization did result in some women being mis-categorized as low-risk when in fact they were high risk, or vice-versa, this would increase error and decrease the size of any effects found.”

At least this is how I read this. If they had meant the expected effect goes down with random error they would have said this.

]]>H0 == A == Ovulating woman is wearing a red shirt

H1 == B == Not A == Ovulating woman is NOT wearing a red shirt

Transposing the Conditional: The probability A is true given B != the probability B is true given A.

Conclusion: Therefore, if we reject H0, this does not imply H1 is true. We would need to know the prior probabilities of H0 and H1 to say.

]]>This is why my laws are deterministic and irrefutable.

Trust me, I’ve seen it time and again. All you have to do is torture the data enough, and it will tell you what you knew all along to be the truth.

Statistics is for losers.

]]>Using the word “probably” hurts my theoretical ego. I only deal in irrefutables.

]]>Replicate with a cold weather study using only unmarried females. Again, no go. Don’t give up. Split them into pretty & non-pretty women (all self identified, of course). Aha! There it appears again. Robust effect indeed.

Eventually, we might conclude: Pretty, unmarried, vegan, athletic women who vote Democrat are more likely to wear pink while ovulating. Only in cold weather, of course.

]]>But they followed up with a controlled experiment (collecting some data on a cold day and some on a warm day) that replicated the weather-by-ovulation interaction . Sure, you could ask for a larger sample size and more sophisticated statistics (and from your Slate article, a better definition of ovulation), but I assume by your focus on “predictability” your primary criticism is that the paper is post hoc data snooping.

Doesn’t the second experiment address that concern to soothe Karl Popper just a little (ignoring that they designed a study to test for support, not falsification)?

]]>