Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

How many data points do you really have?

Chris Harrison writes:

Rare name analysis and wealth convergence

Steve Hsu summarizes the research of economic historian Greg Clark and Neil Cummins: Using rare surnames we track the socio-economic status of descendants of a sample of English rich and poor in 1800, until 2011. We measure social status through wealth, education, occupation, and age at death. Our method allows unbiased estimates of mobility rates. [...]

“False-positive psychology”

Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write: Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an [...]

Philosophy of Bayesian statistics: my reactions to Wasserman

Continuing with my discussion of the articles in the special issue of the journal Rationality, Markets and Morals on the philosophy of Bayesian statistics: Larry Wasserman, “Low Assumptions, High Dimensions”: This article was refreshing to me because it was so different from anything I’ve seen before. Larry works in a statistics department and I work [...]

Adding an error model to a deterministic model

Daniel Lakeland asks, “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from [...]

The more likely it is to be X, the more likely it is to be Not X?

This post is by Phil Price. A paper by Wood, Douglas, and Sutton looks at “Beliefs in Contradictory Conspiracy Theories.”  Unfortunately the  subjects were 140 undergraduate psychology students, so one wonders how general the results are.  I found this sort of arresting: In Study 1 (n=137), the more participants believed that Princess Diana faked her [...]

Bayesian model-building by pure thought: Some principles and examples

This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. We illustrate [...]

The inevitable problems with statistical significance and 95% intervals

I’m thinking more and more that we have to get rid of statistical significance, 95% intervals, and all the rest, and just come to a more fundamental acceptance of uncertainty. In practice, I think we use confidence intervals and hypothesis tests as a way to avoid acknowledging uncertainty. We set up some rules and then [...]

Convenient page of data sources from the Washington Post

Wayne Folta points us to this list.

Chris Schmid on Evidence Based Medicine

Chris Schmid is a statistician at New England Medical Center who is an expert on evidence-based medicine. I invited him to present an introductory overview lecture on the topic at last year’s Joint Statistical Meetings, and here are his slides. All 123 of them. I don’t know how he expected to go though all of [...]

Difficulties in publishing non-replications of implausible findings

Eric Tassone points me to this news article by Christopher Shea on the challenges of debunking ESP. Shea writes: Earlier this year, a major psychology journal published a paper suggesting that there was some evidence for “pre-cognition,” a form of ESP. Stuart Ritchie, a doctoral student at the University of Edinburgh, is part of a [...]

Advice on do-it-yourself stats education?

Dustin Palmer writes: I am a recent graduate looking for a bit of advice. While I took intro classes on math and statistics in my undergraduate degree as a political science major, I find myself university-less and seeking to develop my statistics toolkit. I work for an NGO in the international development field. I think [...]

Excellence in Statistical Reporting Award

The American Statistical Association is seeking nominations for its annual Excellence in Statistical Reporting Award. The award was created in 2004 to encourage and recognize members of the communications media who have best displayed an informed interest in the science of statistics and its role in public life. The award can be given for a [...]

What are the important issues in ethics and statistics? I’m looking for your input!

I’ve recently started a regular column on ethics, appearing every three months in Chance magazine. My first column, “Open Data and Open Methods,” is here, and my second column, “Statisticians: When we teach, we don’t practice what we preach” (coauthored with Eric Loken) will be appearing in the next issue. Statistical ethics is a wide-open [...]

Jobs in statistics research! In New Jersey!

Kenny writes: The Statistics Research group in AT&T Labs invites applications for full time research positions. Applicants should have a Ph.D. in Statistics (or a related field), and be able to make major, widely-recognized contributions to statistics research: theory, methods, computing, and data analysis. Candidates must demonstrate a potential for excellence in research, a knowledge [...]

Intro to splines—with cool graphs

Ido Rosen pointed me to this page by Mike Kamermans.

Unconvincing defense of the recent Russian elections, and a problem when an official organ of an academic society has low standards for publication

Last month we reported on some claims of irregularities in the recent Russian elections. Just as a reminder, here are a couple graphs: Yesterday someone pointed me to two online articles: Mathematical proof of fraud in Russian elections unsound and US elections are as ‘non-normal’ as Russian elections. I know nothing about Russian elections and [...]

What are the standards for reliability in experimental psychology?

An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present. What about Cohen’s kappa? The psychologist [...]

Martin and Liu: Probabilistic inference based on consistency of model with data

What better way to start then new year than with some hard-core statistical theory? Ryan Martin and Chuanhai Liu send along a new paper on inferential models: Probability is a useful tool for describing uncertainty, so it is natural to strive for a system of statistical inference based on probabilities for or against various hypotheses. [...]

Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the [...]

“The difference between . . .”: It’s not just p=.05 vs. p=.06

The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making [...]

Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

Jeremy Fox asks what I think about this paper by David N. Reshef, Yakir Reshef, Hilary Finucane, Sharon Grossman, Gilean McVean, Peter Turnbaugh, Eric Lander, Michael Mitzenmacher, and Pardis Sabeti which proposes a new nonlinear R-squared-like measure. My quick answer is that it looks really cool! From my quick reading of the paper, it appears [...]

CrossValidated: A place to post your statistics questions

Seth Rogers writes: I [Rogers] am a member of an online community of statisticians where I burn a great deal of time (and a recovering cog sci researcher). Our community website is a peer-reviewed Q and A spanning stats topics ranging from applications to mathematical theory. Our online community consists of mostly university faculty, grad [...]

More frustrations trying to replicate an analysis published in a reputable journal

The story starts in September, when psychology professor Fred Oswald wrote me: I [Oswald] wanted to point out this paper in Science (Ramirez & Beilock, 2010) examining how students’ emotional writing improves their test performance in high-pressure situations. Although replication is viewed as the hallmark of research, this paper replicates implausibly large d-values and correlations [...]

I Am Too Absolutely Heteroskedastic for This Probit Model

Soren Lorensen wrote: I’m working on a project that uses a binary choice model on panel data. Since I have panel data and am using MLE, I’m concerned about heteroskedasticity making my estimates inconsistent and biased. Are you familiar with any statistical packages with pre-built tests for heteroskedasticity in binary choice ML models? If not, [...]