Skip to content
Archive of posts filed under the Miscellaneous Statistics category.

My final exam for Design and Analysis of Sample Surveys

We had 28 class periods, so I wrote an exam with an approximate correspondence of one question per class. Rather than dumping the exam in your lap all at once, I’ll post the questions once per day. Then each day I’ll post the answer to yesterday’s questions. So it will be 29 days in all. [...]

Modeling y = a + b + c

Brandon Behlendorf writes:

Systematic review of publication bias in studies on publication bias

Via Yalda Afshar, a 2005 paper by Hans-Hermann Dubben and Hans-Peter Beck-Bornholdt: Publication bias is a well known phenomenon in clinical literature, in which positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Conclusions exclusively based on published studies, therefore, can be misleading. [...]

Understanding simulations in terms of predictive inference?

David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in [...]

Bad news about (some) statisticians

Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”: Rojas: “What does your research tell us about a sample of, say, a few hundred cases?” Statistician: “That’s not important. My result works as n–> 00.” Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the [...]

Let’s play “Guess the smoother”!

Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although [...]

Modeling probability data

Rafael Huber writes:

More proposals to reform the peer-review system

Chris Said points us to two proposals to fix the system for reviewing scientific papers. Both the proposals are focused on biological research. Said writes:

Best lottery story ever

Kansas Man Does Not Win Lottery, Is Struck By Lightning. Finally, a story that gets the probabilities right.

Dispute about ethics of data sharing

Several months ago, Sam Behseta, the new editor of Chance magazine, asked me if I’d like to have a column. I said yes, I’d like to write on ethics and statistics. My first column was called “Open Data and Open Methods” and I discussed the ethical obligation to share data and make our computations transparent [...]

Further thoughts on nonparametric correlation measures

Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a [...]

Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

Jeff Helzner writes: A friend of mine and I cited your open data article in our attempts to persuade a professor at another institution [Brian Leiter] into releasing the raw data from his influential rankings of philosophy departments. He is now claiming the national security response: . . . disclosing the reputational data would violate [...]

Inference = data + model

A recent article on global warming reminded me of the difficulty of letting the data speak. William Nordhaus shows the following graph:

“All Models are Right, Most are Useless”

The above is the title of a talk that Thad Tarpey gave at the Joint Statistical Meetings in 2009. Here’s the abstract: Students of statistics are often introduced to George Box’s famous quote: “all models are wrong, some are useful.” In this talk I [Tarpey] argue that this quote, although useful, is wrong. A different [...]

Multiple comparisons dispute in the tabloids

Yarden Katz writes: I’m probably not the first to point this out, but just in case, you might be interested in this article by T. Florian Jaeger, Daniel Pontillo, and Peter Graff on a statistical dispute [regarding the claim, "Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa"]. Seems directly relevant [...]

Confusion from illusory precision

When I posted this link to Dean Foster’s rants, some commenters pointed out this linked claim by famed statistician/provacateur Bjorn Lomberg: If [writes Lomborg] you reduce your child’s intake of fruits and vegetables by just 0.03 grams a day (that’s the equivalent of half a grain of rice) when you opt for more expensive organic [...]

“Apple confronts the law of large numbers” . . . huh?

I was reading this news article by famed business reporter James Stewart:

A statistician’s rants and raves

Not from me, from Dean Foster, who maybe was in the same stochastic processes course with me, thirty years ago.

How many data points do you really have?

Chris Harrison writes:

Rare name analysis and wealth convergence

Steve Hsu summarizes the research of economic historian Greg Clark and Neil Cummins: Using rare surnames we track the socio-economic status of descendants of a sample of English rich and poor in 1800, until 2011. We measure social status through wealth, education, occupation, and age at death. Our method allows unbiased estimates of mobility rates. [...]

“False-positive psychology”

Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write: Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an [...]

Philosophy of Bayesian statistics: my reactions to Wasserman

Continuing with my discussion of the articles in the special issue of the journal Rationality, Markets and Morals on the philosophy of Bayesian statistics: Larry Wasserman, “Low Assumptions, High Dimensions”: This article was refreshing to me because it was so different from anything I’ve seen before. Larry works in a statistics department and I work [...]

Adding an error model to a deterministic model

Daniel Lakeland asks, “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from [...]

The more likely it is to be X, the more likely it is to be Not X?

This post is by Phil Price. A paper by Wood, Douglas, and Sutton looks at “Beliefs in Contradictory Conspiracy Theories.”  Unfortunately the  subjects were 140 undergraduate psychology students, so one wonders how general the results are.  I found this sort of arresting: In Study 1 (n=137), the more participants believed that Princess Diana faked her [...]

Bayesian model-building by pure thought: Some principles and examples

This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. We illustrate [...]