Skip to content

Rasmussen and Williams never said that Gaussian processes resolve the problem of overfitting

Apparently there’s an idea out there that Bayesian inference with Gaussian processes automatically avoids overfitting. But no, you can still overfit.

To be precise, Bayesian inference by design avoids overfitting—if the evaluation is performed by averaging over the prior distribution. But to the extent this is not the case—to the extent that the frequency-evaluation distribution and the prior distribution are different things—we do need to worry about overfitting (or, I guess, underfitting).

But what do I know about Gaussian processes? Let’s hand the mike over to the experts.

I searched in the book and found exactly four mentions of “overfitting”:

Anybody want a drink before the war?

Your lallies look like darts, and you’ve got nanti carts, but I love your bona eke – Lee Sutton (A near miss)

I’ve been thinking about gayface again. I guess this is for a bunch of reasons, but one of the lesser ones is that this breathless article by JD Schramm popped up in the Washington Post the other day. This is not just because it starts with a story I relate to very deeply about accidentally going on a number of dates with someone. (In my case, it was a woman, I was 19, and we bonded over musical theatre. Apparently this wasn’t a giveaway. Long story short, she tried to kiss me, I ran away.)

There is no other Troy for me to burn

The main gist of Schramm’s article is that the whole world is going to end and all the gays are going to be rounded up into concentration camps or some such things due to the implications of Wang and Kosinski’s gayface article.

I think we can safely not worry about that.

Success has made a failure of our home

For those of you who need a refresher, the Wang and Kosinski paper had some problems. They basically scraped a bunch of data from an undisclosed dating site that caters to men and women of all persuasions and fed it to a deep neural network face recognition program to find facial features that were predictive of being gay or predictive or being straight.  They then did a sparse logistic regression to build a classifier.

There were some problems.

  1. The website didn’t provide sexual orientation information, but only a “looking for” feature. Activity and identity overlap but are not actually the same thing, so we’re already off to an awkward start.

To go beyond that, we need to understand what these machine learning algorithms can actually do. The key thing is that they do not extrapolate well. They can find deep, non-intuitive links between elements of a sample (which is part of why they can be so successful for certain tasks), but it can’t imagine unobserved data.

For example, if we were trying to classify four legged creatures and we fed the algorithm photos of horses and donkeys, you’d expect it to generalize well to photos of mules, but less well to photos of kangaroos.

To some extent, this is what we talk about when we talk about “generalization error”. If a classifier does a great job on the data it was trained on (and holdout sets thereof), but a terrible job on a new dataset, one explanation would be that the training data is in some material way different from the new data set. This would turn classification on the new data set into an extrapolation task, which is an area where these types of algorithms excel.

2. The training data set is a terrible representative of the population. The testing set is even worse.

There are other problems with the paper. My favourite is that they find that facial brightness is positively correlated with the probability of being gay and posit a possible reason for that is that is that an overabundance of testosterone darkens skin. Essentially, they argue that straight people are a bit dull because they’ve got too much testosterone.

As much as I enjoy the idea that they’ve proposed some sort of faggy celestial navigation (you’re never lost if there’s a gay on the horizon to light your way to safety), it’s not that likely. More likely, gay men use more filters in their dating profile shots and we really should sound the correlation is not causation klaxon.

How I be me (and you be you)?

But the howling, doom-laden tone of the Schramm piece did make me think about if  building the sort of AI he’s warning against would even be possible.

Really, we’re talking about passive gaydar here, where people pick up on if you’re gay based solely on information that isn’t actively broadcast. Active gaydar is a very different beast–this requires a person to actively signal their orientation. Active gaydar is known to confuse whales and cause them to beach themselves, so please avoid using it near large bodies of water.

To train an AI system to be a passive gaydar, you would need to feed it with data that covered the broad spectrum of presentation of homosexuality. This is hard. It differs from place to place, among cultures, races, socioeconomic groups. More than this, it’s not stable in time. (A whole lot more straight people know drag terminology now than a decade ago, even if they do occasionally say “cast shade” like it’s a D&D spell.)

On top of this, the LGB population is only a small fraction of the whole population. This means that even a classifier that very accurately identifies known gay people or known straight people will be quite inaccurate when applied to the whole population. This is the problem with conditional probabilities!

I think we need to be comfortable with the idea that among all of the other reasons we shouldn’t try to build an AI gaydar, we probably just can’t. Building an AI gaydar would be at least as hard as building a self-driving car. Probably much harder.

“If I wanted to graduate in three years, I’d just get a sociology degree.”

From an interview with a UCLA QB who’s majoring in economics:

Look, football and school don’t go together. They just don’t. Trying to do both is like trying to do two full-time jobs. . . . No one in their right mind should have a football player’s schedule, and go to school. It’s not that some players shouldn’t be in school; it’s just that universities should help them more—instead of just finding ways to keep them eligible.

Any time any player puts into school will take away from the time they could put into football. . . .

But some players do manage to do it, even to graduate in three years. Here’s what the UCLA QB / econ major has to say about that:

If I wanted to graduate in three years, I’d just get a sociology degree.

Ha! But I can’t really laugh too hard. If sociology is a good choice of major for people who can’t handle the rigors of econ, I have a feeling that statistics is a good choice of major for people who can’t handle the rigors of math or CS.

The best part of the interview, though, is this:

I want to get my MBA. I want to create my own business. When I’m finished with football, I want a seamless transition to life and work and what I’ve dreamed about doing all my life. I want to own the world. Every young person should be able to have that dream and the ability to access it. I don’t think that’s too much to ask.

Hey, wait a minute. If everyone gets to “own the world,” who would do the work?? I guess he hasn’t reached that stage in his economics education, where he learns that really is “too much to ask.”

P.S. Commenters convinced me that I missed the point: when the QB said “own the world,” he didn’t literally mean “own,” he just meant something like “feel like you’re on top of the world.” And, sure, no reason why everyone can’t feel that way.

“Have Smartphones Destroyed a Generation?” and “The Narcissism Epidemic”: How can we think about the evidence?

Jay Livingston points to this hypey article, “Have Smartphones Destroyed a Generation?”, by Jean Twenge, who writes:

I’ve been researching generational differences for 25 years . . . Typically, the characteristics that come to define a generation appear gradually, and along a continuum. . . . [But] Around 2012, I noticed abrupt shifts in teen behaviors and emotional states. The gentle slopes of the line graphs became steep mountains and sheer cliffs, and many of the distinctive characteristics of the Millennial generation began to disappear. In all my analyses of generational data—some reaching back to the 1930s—I had never seen anything like it. . . .

At first I presumed these might be blips, but the trends persisted, across several years and a series of national surveys. The changes weren’t just in degree, but in kind. . . .

What happened in 2012 to cause such dramatic shifts in behavior? It was after the Great Recession, which officially lasted from 2007 to 2009 and had a starker effect on Millennials trying to find a place in a sputtering economy. But it was exactly the moment when the proportion of Americans who owned a smartphone surpassed 50 percent. . . .

OK, interesting. But Livingston recalled he’d seen some mention of Twenge’s research before. Here’s Livingston, from 2016:

There it was again, the panic about the narcissism of millennials as evidenced by selfies. This time it was NPR’s podcast Hidden Brain. . . . The show’s host Shankar Vedantem chose to speak with only one researcher on the topic – psychologist Jean Twenge, whose even-handed and calm approach is clear from the titles of her books, Generation Me and The Narcissism Epidemic. She is obviously not alone in worrying about the narcissistic youth of America. In 2013, a Time Magazine cover on “The Me Me Me Generation” showed a millennialish woman taking a selfie. . . .

Twenge, in the Hidden Brain episode, uses individualism and narcissism as though they were interchangeable. She refers to her data on the increase in “individualistic” pronouns and language, even though linguists have shown this idea to be wrong (see Mark Liberman at Language log here and here). . . .

Then there’s the generational question. Are millennials more narcissistic than were their parents or grandparents? Just in case you’ve forgotten, that Time magazine cover was not the first one focused on “me.” In 1976, New York Magazine ran a similarly titled article by Tom Wolfe. . . . And maybe, if you’re old enough, when you read the title The Narcissism Epidemic, you heard a faint echo of a book by Christopher Lasch published thirty years earlier. . . .

Then Livingston brings in the data:

Since 1975, Monitoring the Future (here) has surveyed large samples of US youth. It wasn’t designed to measure narcissism, but it does include two relevant questions:

  • Compared with others your age around the country, how do you rate yourself on school ability?
  • How intelligent do you think you are compared with others your age?

It also has self-esteem items including

  • I take a positive attitude towards myself
  • On the whole, I am satisfied with myself
  • I feel I do not have much to be proud of (reverse scored)

A 2008 study compared 5-year age groupings and found absolutely no increase in “egotism” (those two “compared with others” questions). The millennials surveyed in 2001-2006 were almost identical to those surveyed twenty-five years earlier. The self-esteem questions too showed little change.

Another study by Brent Roberts, et al., tracked two sources for narcissism: data from Twenge’s own studies; and data from a meta-analysis that included other research, often with larger samples. The test of narcissism in all cases was the Narcissism Personality Inventory – 40 questions designed to tap narcissistic ideas.

A sample from a 16-item version of the Narcissitic Personality Inventory. Narcissistic responses are in boldface. (It’s hard to read these and not think of Donald Trump.)
1.    __ I really like to be the center of attention 
__ It makes me uncomfortable to be the center of attention

2.    __I am no better or nor worse than most people
__I think I am a special person

3.    __Everybody likes to hear my stories 
__Sometimes I tell good stories

5.    __I don’t mind following orders
__I like having authority over people

7.    __People sometimes believe what I tell them
__I can make anybody believe anything I want them to 

10.  __ I am much like everybody else
__  I am an extraordinary person 

13. __ Being an authority doesn’t mean that much to me
__People always seem to recognize my authority

14.  __ I know that I am good because everybody keeps telling    me so 
__When people compliment me I sometimes get embarrassed

16.  __ I am more capable than other people 
__There is a lot that I can learn from other people

Their results look like this:

Twenge’s sources justify her conclusion that narcissism is on the rise. But include the other data and you wonder if all the fuss about kids today is a bit overblown. You might not like participation trophies or selfie sticks or Instagram, but it does not seem likely that these have created an epidemic of narcissism.

 

Given all this, I’m skeptical about Twenge’s claims of big changes in 2012. I’d like to see the data, or, better still, given my own time constraints, to see the data analyzed by some independent source.

Why so negative?

At this point, you might ask: Why do I have to be so negative? Why reflexively disbelieve Twenge’s claims?

My answer: I’m not saying I disbelieve or that I think Twenge’s claims are wrong. I just don’t see that she’s provided convincing evidence for these claims.

I will say, though, that there’s something refreshing about an article saying that today’s kids are screwed up because they don’t have enough sex:

The topic is important

To put it another way, I write about this because I care; I share Twenge’s concern. If it’s really true that “the number of teens who get together with their friends nearly every day dropped by more than 40 percent from 2000 to 2015,” that’s interesting. But I’d have to see it to believe it. And I don’t really know what I’m supposed to do with statements such as, “Those who spend an above-average amount of time with their friends in person are 20 percent less likely to say they’re unhappy than those who hang out for a below-average amount of time.” Also this: “three times as many 12-to-14-year-old girls killed themselves in 2015 as in 2007, compared with twice as many boys,” which is indeed a concern, but then I googled *teen suicide rates* and found this: “The rate of Americans ending their own lives has risen to its highest level in decades, according to a new study from the Centers for Disease Control, and the increase is especially pronounced among women. . . . The suicide rate increased for women of all ages, but the spike was especially pronounced for women aged 45-64. And although such incidents are comparatively rare, suicides of girls aged 10-14 increased 200% in that period, to 150 in 2014.” So, yes it’s a concern, but nothing particular about the younger generation.

There should be a way for researchers and journalists to float interesting hypotheses without feeling the need to act as if all their findings point in the same direction.

“Deeper into democracy: the legitimacy of challenging Brexit’s majoritarian mandate”

There’s no reason that we should trust someone’s thoughts on politics just because he’s a good chess player, or even a good writer. That said, I found this opinion piece by Jonathan Rowson on Britain and the EU to be worth reading.

Also I came across this short post by Rowson on “virtue signaling” which made some reasonable points.

P.S. I’d heard of Rowson from reading his book Seven Deadly Chess Sins, which is great (if a bit over my head, chess-wise), and which came out when he was only 23!

Zoologist slams statistical significance

Valentin Amrhein writes that statistical significance and hypothesis testing are not really helpful when it comes to testing our hypotheses. I’m not quite sure I like the title of Amrhein’s post—“Inferential Statistics is not Inferential”—as I think of parameter estimation, model checking, forecasting, etc., all as forms of inference. But I agree with his general points, and it’s good to see these ideas being expressed all over, by researchers in many different fields.

The graphs tell the story. Now I want to fit this story into a bigger framework so it all makes sense again.

Paul Alper points us to these graphs:

Pretty stunning. I mean, really stunning. Why are we just hearing about this now, given that the pattern is a decade old?

And what’s this: “Data for the U.S. ends in 2007”? Huh?

Also, it’s surprising how high the rates were for Japan, Italy, and Germany in the 1970s. Whassup with that?

The whole thing is tough to understand; I just don’t know how to think about it.

One lesson from all of this, I think, is that our public space (newspapers, TV, etc.) doesn’t have enough experts on demography and public health; thus these sorts of basic statistics come as a surprise to us. Consider: compared to most people, I’m an expert on demography and public health, but these graphs came as a complete surprise to me.

In all seriousness, I think our public discussion space needs fewer doctors and fewer economists, and more demographers and public health experts. I have no problem with doctors and economists, considered individually as experts; there just seems to be an imbalance in aggregate exposure, comparing these professions to others with relevant expertise in the same questions.

Let’s face it, I know nothing about spies.

I saw this news article:

Multiple federal agencies investigated claims that former Indiana basketball coach Bobby Knight groped and verbally sexually harassed several female employees when he gave a speech at the National Geospatial-Intelligence Agency in July 2015, according to newly-released documents. . . . he had slapped a “senior woman … on [her] butt” and “fondled a woman’s breast.”

And my first thought is, hey, this guy is what, 75 years old, and he’s groping spies?? Isn’t he afraid one of them will give him a karate chop to the head?

I guess I’ve watched too many episodes of The Americans. Real spies, it seems, just let old guys grope them. In real life, it seems that politeness is part of the spy code of conduct. Who knew? I’d’ve thought one of these ladies would’ve decked him.

Low power and the replication crisis: What have we learned since 2004 (or 1984, or 1964)?

I happened to run across this article from 2004, “The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies,” by Scott Maxwell and published in the journal Psychological Methods.

In this article, Maxwell covers a lot of the material later discussed in the paper Power Failure by Button et al. (2013), and the 2014 paper on Type M and Type S errors by John Carlin and myself. Maxwell also points out that these alarms were raised repeatedly by earlier writers such as Cohen, Meehl, and Rozeboom, from the 1960s onwards.

In this post, I’ll first pull out some quotes from that 2004 paper that presage many of the issues of non-replications that we still are wrestle with today. Then I’ll discuss what’s been happening since 2004: what’s new in our thinking in the past fifteen years.

I’ll argue that, yes, everyone should’ve been listening to Cohen, Meehl, Roseboom, Maxwell, etc., all along; and also that we have been making some progress, that we have some new ideas that might help us move forward.

Part 1: They said it all before

Here’s a key quote from Maxwell (2004):

When power is low for any specific hypothesis but high for the collection of tests, researchers will usually be able to obtain statistically significant results, but which specific effects are statistically significant will tend to vary greatly from one sample to another, producing a pattern of apparent contradictions in the published literature.

I like this quote, as it goes beyond the usual framing in terms of “false positives” etc., to address the larger goals of a scientific research program.

Maxwell continues:

A researcher adopting such a strategy [focusing on statistically significant patterns in observed data] may have a reasonable probability of discovering apparent justification for recentering his or her article around a new finding. Unfortunately, however, this recentering may simply reflect sampling error . . . this strategy will inevitably produce positively biased estimates of effect sizes, accompanied by apparent 95% confidence intervals whose lower limit may fail to contain the value of the true population parameter 10% to 20% of the time.

He also slams deterministic reasoning:

The presence or absence of asterisks [indicating p-value thresholds] tends to convey an air of finality that an effect exists or does not exist . . .

And he mentions the “decline effect”:

Even a literal replication in a situation such as this would be expected to reveal smaller effect sizes than those originally reported. . . . the magnitude of effect sizes found in attempts to replicate can be much smaller than those originally reported, especially when the original research is based on small samples. . . . these smaller effect sizes might not even appear in the literature because attempts to replicate may result in nonsignificant results.

Classical multiple comparisons corrections won’t save you:

Some traditionalists might suggest that part of the problem . . . reflects capitalization on chance that could be reduced or even eliminated by requiring a statistically significant multivariate test. Figure 3 shows the result of adding this requirement. Although fewer studies will meet this additional criterion, the smaller subset of studies that would now presumably appear in the literature are even more biased . . .

This was a point raised a few years later by Vul et al. in their classic voodoo correlations paper.

Maxwell points out that meta-analysis of published summaries won’t solve the problem:

Including underpowered studies in meta-analyses leads to biased estimates of effect size whenever accessibility of studies depends at least in part on the presence of statistically significant results.

And this:

Unless psychologists begin to incorporate methods for increasing the power of their studies, the published literature is likely to contain a mixture of apparent results buzzing with confusion.

And the incentives:

Not only do underpowered studies lead to a confusing literature but they also create a literature that contains biased estimates of effect sizes. Furthermore . . . researchers may have felt little pressure to increase the power of their studies, because by testing multiple hypotheses, they often assured themselves of a reasonable probability of achieving a goal of obtaining at least one statistically significant result.

And he makes a point that I echoed many years later, regarding the importance of measurement and the naivety of researchers who think that the answer to all problems is to crank up the sample size:

Fortunately, an assumption that the only way to increase power is to increase sample size is almost always wrong. Psychologists are encouraged to familiarize themselves with additional methods for increasing power.

Part 2: (Some of) what’s new

So, Maxwell covered most of the ground in 2004. Here are a few things that I would add, from my standpoint nearly fifteen years later:

1. I think the concept of “statistical power” itself is a problem in that it implicitly treats the attainment of statistical significance as a goal. As Button et al. and others have discussed, low-power studies have a winner’s curse aspect, in that if you do a “power = 0.06” study and get lucky and find a statistical significant result, your estimate will be horribly exaggerated and likely in the wrong direction.

To put it another way, I fear that a typical well-intentioned researcher will want to avoid low-power studies—and, indeed, it’s trivial to talk yourself into thinking your study has high power, by just performing the power analysis using an overestimated effect size from the published literature—but will also think that a low power study is essentially a role of the dice. The implicit attitude is that in a study with, say, 10% power, you have a 10% chance of winning. But in such cases, a win is really a loss.

2. Variation in effects and context dependence. It’s not about identifying whether an effect is “true” or a “false positive.” Rather, let’s accept that in the human sciences there are no true zeroes, and relevant questions include the magnitude of effects, and how and where they vary. What I’m saying is: less “discovery,” more exploration and measurement.

3. Forking paths. If I were to rewrite Maxwell’s article today, I’d emphasize that the concern is not just multiple comparisons that have been performed, but also multiple potential comparisons. A researcher can walk through his or her data and only perform one or two analyses, but these analyses will be contingent on data, so that had the data been different, they would’ve been summarized differently. This allows the probability of finding statistical significance to approach 1, given just about any data (see, most notoriously, this story). In addition, I would emphasize that “researcher degrees of freedom” (in the words of Simmons, Nelson, and Simonsohn, 2011) arise not just in the choice of which of multiple coefficients to test in a regression, but also in which variables and interactions to include in the model, how to code data, and which data to exclude (see my above-linked paper with Loken for sevaral examples).

4. Related to point 2 above is that some effects are really really small. We all know about ESP, but there are also other tiny effects being studied. An extreme example is the literature on sex ratios. At one point in his article, referring to a proposal that psychologists gather data on a sample of a million people, Maxwell writes, “Thankfully, samples this large are unnecessary even to detect minuscule effect sizes.” Actually, if you’re studying variation in the human sex ratio, that’s about the size of sample you’d actually need! For the calculation, see pages 645-646 of this paper.

5. Flexible theories: The “goal of obtaining at least one statistically significant result” is only relevant because theories are so flexible that just about any comparison can be taken to be consistent with theory. Remember sociologist Jeremy Freese’s characterization of some hypotheses as “more vampirical than empirical—unable to be killed by mere evidence.”

6. Maxwell writes, “it would seem advisable to require that a priori power calculations be performed and reported routinely in empirical research.” Fine, but we can also do design analysis (our preferred replacement term for “power calculations”) after the data have come in and the analysis has been published. The purpose of a design calculation is not just to decide whether to do a study or to choose a sample size. It’s also to aid in interpretation of published results.

7. Measurement.

Bob likes the big audience

In response to a colleague who was a bit scared of posting some work up on the internet for all to see, Bob Carpenter writes:

I like the big audience for two reasons related to computer science principles.

The first benefit is the same reason it’s scary. The big audience is likely to find flaws. And point them out. In public! The trick is to get over feeling bad about it and realize that it’s a super powerful debugging tool for ideas. Owning up to being wrong in public is also very liberating. Turns out people don’t hold it against you at all (well, maybe they would if you were persistently and unapologetically wrong). It also provides a great teaching opportunity—if a postdoc is confused about something in their speciality, chances are that a lot of others are confused, too.

In programming, the principle is that you want routines to fail early. You want to inspect user input and if there’s a fatal problem with it that can be detected, fail right away and let the user know what the error is. Don’t fire up the algorithm and report some deeply nested error in a Fortran matrix algorithm. Something not being shot down on the blog is like passing that validation. It gives you confidence going on.

The second benefit is the same as in any writing, only the stakes are higher with the big audience. When you write for someone else, you’re much more self critical. The very act of writing can uncover problems or holes in your reasoning. I’ve started several blog posts and papers and realized at some point as I fleshed out an argument that I was missing a fundamental point.

There’s a principle in computer science called the rubber ducky.

One of the best ways to debug is to have a colleague sit down and let you explain your bug to them. Very often halfway through the explanation you find your own bug and the colleague never even understands the problem. The blog audience is your rubber ducky.

The term itself if a misnomer in that it only really works if the rubbery ducky can understand what you’re saying. They don’t need to understand it, just be capable of understanding it. Like the no free lunch principle, there are no free pair programmers.

The third huge benefit is that other people have complementary skills and knowledge. They point out connections and provide hints that can prove invaluable. We found out about automatic differentiation through a comment on the blog to a post where I was speculating about how we could calculate gradients of log densities in C++.

I guess there’s a computer science principle there, too—modularity. You can bring in whole modules of knowledge, like we did with autodiff.

I agree. It’s all about costs and benefits. The cost of an error is low if discovered early. You want to stress test, not to hide your errors and wait for them to be discovered later.

Of rabbits and cannons

When does it make sense to shoot a rabbit with a cannon?

I was reminded of this question recently when I happened to come across this exchange in the comments section from a couple years ago, in the context of the finding patterns in the frequencies of births on different days:

Rahul: Yes, inverting a million element matrix for this sort of problem does have the feel of killing mice with cannon.

Andrew: In many areas of research, you start with the cannon. Once the mouse is dead and you can look at it carefully from all angles, you can design an effective mousetrap. Red State Blue State went the same way: we found the big pattern only after fitting a multilevel model, but once we knew what we were looking for, it was possible to see it in the raw data.

The curse of dimensionality and finite vs. asymptotic convergence results

Related to our (Aki, Andrew, Jonah) Pareto smoothed importance sampling paper I (Aki) received a few times a comment that why bother with Pareto smoothing as you can always choose the proposal distribution so that importance ratios are bounded and then central limit theorem holds. The curse of dimensionality here is that the papers they refer used small dimensional experiments and the results do not work so well in high dimensions. Readers of this blog should not be surprised that things look not the same in high dimensions. In high dimensions the probability mass is far from the mode. It’s spread thinly in surface of high-dimensional sphere. See, e.g. Mike’s paper Bob’s case study, and blog post

In importance sampling one working solution in low dimensions is to use mixture of two proposals. One component tries to match the mode, and the other takes care that tails go down slower than the tails of the target ensuring bounded ratios. In the following I only look at the behavior with one component which has thicker tail and thus importance ratios are bounded (but I have made the similar experiment with mixture, too).

The target distribution is multidimensional normal distribution with zero mean and unit covariance matrix. In the first case the proposal distribution is also normal, but with scale 1.1 in each dimension. The scale is just slightly larger than for the proposal and we are often lucky if we can guess the scale of proposal with 10% accuracy. I draw 100000 draws from the proposal distribution.

The following figure shows when the number of dimensions go from 1 to 1024.

The upper subplot shows the estimated effective sample size. By D=512 importance weighted 100000 draws have only a few practically non-zero weights. The middle subplot shows the convergence rate compared to independent sampling, ie, how fast the variance goes down. By D=1024 the convergence rate has dramatically dropped and getting any improvement in the accuracy requires more and more draws. The bottom subplot shows Pareto khat diagnostic (see the paper for details). Dashed line is k=0.5, which the limit for variance being finite and dotted line is our suggestion for practically useful performance when using PSIS. But how can khat be larger than 0.5 when we have bounded weights! Central limit theorem has not failed here, but we have just not reach yet the asymptotic regime to get CLT to kick in!

The next plot shows more information what happens with D=1024.

Since humans are lousy in looking at 1024 dimensional plots the top subplot shows the 1 dimensional marginal density of the target (blue) and the proposal (red) densities of the distance from the origo r=sqrt(sum_{d=1}^D x_d^2). The proposal density has only 1.1 larger scale than the target, but most of the draws from the proposal are away from the typical set of the target! The vertical dashed line shows 1e-6 quantile of the proposal, ie when we draw 100000 draws, 90% of time we don’t get any draws from there. The middle subplot shows the importance ratio function, and we can see that the highest value is at 0, but that value is larger than 2*10^42! That’s a big number. The bottom subplot scales the y axis show that we see importance ratios near that 1e-6 quantile. Check the y-axis: it’s still from 0 to 1e6. So if we are lucky we may get a draw below the dashed line, but then it’s likely to get all the weight. The importance function is practically as steep everywhere where we can get draws in a time of the age of the universe. 1e-80 quantile is at 21.5 (1e80 is the estimated number of atoms in the visible universe). and it’s still far away from the region where the boundedness of the importance ratio function starts to affect.

I have more similar plots with thick tailed Student’s t, mixture of proposals etc, but I’ll save you from more plots. As long as there is some difference in target and proposal taking the number of dimensions to high enough, IS and PSIS break (PSIS giving slight improvement in the performance, and more importantly can diagnose the problem and improves the Monte Carlo estimate).

In addition that we need to take into account that many methods which work in small dimensions can break in high dimensions, we need to focus more on finite case performance. As seen here it doesn’t help us that CLT holds if we never can reach that asymptotic regime (same as why Metropolis algorithm in high dimensions may require close to infinite time to produce useful results). Pareto diagnostics has been empirically shown to provide very good finite case convergence rate estimates which also match some theoretical bounds.

PS. There has been lot of discussion in comments about typical set vs. high probability set. In the end Carlos Ungil wrote

Your blog post seems to work equally well if you change
“most of the draws from the proposal are away from the typical set of the target!”
to
“most of the draws from the proposal are away from the high probability region of the target!”

I disagree with this, and to show evidence for this I add here couple more plots I didn’t include before (I hope that this blog post will not have eventually as many plots as the PSIS paper).

If the target is multivariate normal, we get bounded weights by using Student-t distribution with finite degrees of freedom nu, even if the scale of the Student-t is smaller than the scale of the target distribution. In the next example the target is same as above, and the proposal distribution is multivariate Student-t with degrees of freedom nu=7 and variance=1.

The following figure shows when the number of dimensions go from 1 to 1024.

The upper subplot shows the estimated effective sample size. By D=64 importance weighted 100000 draws have only a few practically non-zero weights. The middle subplot shows the convergence rate compared to independent sampling, ie, how fast the variance goes down. By D=256 the convergence rate has dramatically dropped and getting any improvement in the accuracy requires more and more draws. The bottom subplot shows Pareto khat diagnostic which predicts well the finite case convergence rate (even if asymptotically with bounded ratios k<1/2).

The next plot shows more information what happens with D=512.

The top subplot shows the 1 dimensional marginal density of the target (blue) and the proposal (red) densities of the distance from the origo r=sqrt(sum_{d=1}^D x_d^2). The proposal density has the same variance but thicker tail. Most of the draws from the proposal are away from the typical set of the target and in this towards higher densities than the density in the typical set. The middle subplot shows the importance ratio function, and we can see that the highest value is close to 47.5 and then the ratio function starts to decrease again. The highest value of the ratio function is larger than 10^158. The region with highest value is far away from the typical set of the proposal distribution. The bottom subplot scales the y axis show that we see importance ratios near the proposal and the target distributions. The importance ratio goes from very small values up to 10^30 in very narrow range, and thus it’s likely that the largest draw from the proposal will get all the weight. It’s unlikely that in practical time we would get enough draws to get the asymptotic benefits of bounded ratios to kick in.

PPS. The purpose of this post was to illustrate that bounded ratios and asymptotic convergence results are not enough for the practically useful performance for IS, but there are also special cases where due to special structure of the posterior we can get practically good performance with IS, and especially with PSIS, also in high dimensions cases (PSIS paper has 400 dimensional example with khat<0.7).

 

“Write No Matter What” . . . about what?

Scott Jaschik interviews Joli Jensen (link from Tyler Cowen), a professor of communication who wrote a new book called “Write No Matter What: Advice for Academics.”

Her advice might well be reasonable—it’s hard for me to judge; as someone who blogs a few hundred times a year, I’m not really part of Jensen’s target audience. She offers “a variety of techniques to help . . . reduce writing anxiety; secure writing time, space and energy; recognize and overcome writing myths; and maintain writing momentum.” She recommends “spending at least 15 minutes a day in contact with your writing project . . . writing groups, focusing on accountability (not content critiques), are great ways to maintain weekly writing time commitments.”

Writing is non-algorithmic, and I’ve pushed hard against advice-givers who don’t seem to get that. So, based on this quick interview, my impression is that Jensen’s on the right track.

I’d just like to add one thing: If you want to write, it helps to have something to write about. Even when I have something I really want to say, writing can be hard. I can only imagine how hard it would be if I was just trying to write, to produce, without something I felt it was important to share with the world.

So, when writing, imagine your audience, and ask yourself why they should care. Tell ’em what they don’t know.

Also, when you’re writing, be aware of your audience’s expectations. You can satisfy their expectations or confound their expectations, but it’s good to have a sense of what you’re doing.

And here’s some specific advice about academic writing, from a few years ago.

P.S. In that same post, Cowen also links to a bizarre book review by Edward Luttwak who, among other things, refers to “George Pataki of New York, whose own executive experience as the State governor ranged from the supervision of the New York City subways to the discretionary command of considerable army, air force and naval national guard forces.” The New York Air National Guard, huh? I hate to see the Times Literary Supplement fall for this sort of pontificating. I guess that there will always be a market for authoritative-sounding pundits. But Tyler Cowen should know better. Maybe it was the New York thing that faked him out. If Luttwak had been singing the strategic praises of the New Jersey Air National Guard, that might’ve set off Cowen’s B.S. meter.

How to think about the risks from low doses of radon

Nick Stockton, a reporter for Wired magazine, sent me some questions about radiation risk and radon, and Phil and I replied. I thought our responses might be of general interest so I’m posting them here.

First I wrote:

Low dose risk is inherently difficult to estimate using epidemiological studies. I’ve seen no evidence that risk is not linear at low dose, and there is evidence that areas with high radon levels have elevated levels of lung cancer. When it comes to resource allocation, we recommend that measurement and remediation be done in areas of high average radon levels but not at a national level; see here and here and, for a more technical treatment, here.

Regarding the question of “If the concerns about the linear no-threshold model for radiation risk are based on valid science, why don’t public health agencies like the EPA take them seriously?” I have no idea what goes on within the EPA, but when it comes to radon remediation, the effects of low dose exposure aren’t so relevant to the decision: if your radon level is low (as it is in most homes in the U.S.) you don’t need to do anything anyway; if your radon level is high, you’ll want to remediate; if you don’t know your radon level but it has a good chance of being high, you should get an accurate measurement and then make your decision.

For homes with high radon levels, radon is a “dangerous, proven harm,” and we recommend remediation. For homes with low radon levels, it might or might not be worth your money to remediate; that’s an individual decision based on your view of the risk and how much you can afford the $2000 or whatever to remediate.

Then Phil followed up:

The idea of hormesis [the theory that low doses of radiation can be beneficial to your health] is not quackery. Nor is LNT [the linear no-threshold model of radiation risk].

I will elaborate.

The theory behind LNT isn’t just ‘we have to assume something’, nor ‘everything is linear to first order’. The idea is that twice as much radiation means twice as many cells with damaged DNA, and if each cell with damaged DNA has a certain chance of initiating a cancer, then ceterus paribus you have LNT. That’s not crazy.

The theory behind hormesis is that your body has mechanisms for dealing with cancerous cells, and that perhaps these mechanisms recognize become more active or more effective when there is more damage. That’s not crazy either.

Perhaps exposure to a little bit of radiation isn’t bad for you at all. Perhaps it’s even good for you. Perhaps it’s just barely bad for you, but then when you’re exposed to more, you overwhelm the repair/rejection mechanisms and at some point just a little bit more adds a great deal of risk. This goes for smoking, too: maybe smoking 1/4 cigarette per day woud be good for you. For radiation there are various physiological models and there are enough adjustable parameters to get just about any behavior out of the models I have seen.

Of course what is needed is actual data. Data can be in vitro or in vivo; population-wide or case-control; etc.

There’s fairly persuasive evidence that the dose-response relationship is significantly nonlinear at low doses for “low linear-energy-transfer radiation”, aka low-LET radiation, such as x-rays. I don’t know whether the EPA still uses a LNT model for low-LET radiation.

But for high-LET radiation, including the alpha radiation emitted by radon and most of its decay products of concern, I don’t know much about the dose-response relationship at low does and I’m very skeptical of anyone who says they do know. There are some pretty basic reasons to expect low-LET and high-LET radiation to have very different effects. Perhaps I need to explain just a bit. For a given amount of energy that is deposited in tissue, low-LET radiation causes a small disruption to a lot of cells, whereas high-LET radiation delivers a huge wallop to relatively few cells.

An obvious thing to do is to look at people who have been exposed to high levels of radon and its decay products. As you probably know, it is really radon’s decay products that are dangerous, not radon itself. When we talk about radon risk, we really mean the risk from radon and its decay products.

At high concentrations, such as those found in uranium mines, it is very clear that radiation is dangerous, and that the more you are exposed to the higher your risk of cancer. I don’t think anyone would argue against the assertion that an indoor radon concentration of, say, 20 pCi/L leads to a substantially increased risk of lung cancer. And there are houses with living area concentrations that high, although not many.

A complication is that the radon risk for smokers seems to be much higher than for non-smokers. That is, a smoker exposed to 20 pCi/L for ten hours per day for several years is at much higher risk than a non-smoker with the same level of exposure.

But what about 10, 4, 2, or 1 pCi/L? No one really knows.

One thing people have done (notably Bernard Cohen, who you’ve probably come across) is to look at the average lung cancer rate by county, as a function of the average indoor radon concentration by county. If you do that, you find that low-radon counties actually have lower lung cancer rates than high-radon counties. But: a disproportionate fraction of low-radon counties are in the South, and that’s also where smoking rates are highest. It’s hard to completely control for the effect of smoking in that kind of study, but you can do things like look within individual states or regions (for instance, look at the relationship between average county radon concentrations and average lung cancer rates in just the northeast) and you still find a slight effect of higher radon being associated with lower lung cancer rates. If taken at face value, this would suggest that a living-aread concentration of 1 pCi/L or maybe even 2 pCi/L would be better than 0. But few counties have annual-average living-area radon concentration over about 2 pCi/L, and of course any individual county has a range of radon levels. Plus people move around, both within and between countties, so you don’t know the lifetime exposure of anyone. Putting it all together, even if there aren’t important confounding variables or other issues, these studies would suggest a protective effect at low radon levels but they don’t tell you anything about the risk at 10 pCi/L or 4 pCi/L.

There’s another class of studies, case-control studies, in which people with lung cancer are compared statistically to those without. In this country the objectively best of these looked at women in Iowa. (You may have come across this work, led by Bill Field). Iowa has a lot of farm women who don’t smoke and who have lived in just a few houses for their whole lives. Some of these women contracted lung cancer. The study made radon measurements in these houses, and in the houses of women of similar demographics who didn’t get lung cancer. They find increased risk at 4 pCi/L (even for nonsmokers, as I recall) and they are certainly inconsistent with a protective effect at 4 pCi/L. As I recall — you should check — they also found a positive estimated risk at 2 pCi/L that is consistent with LNT but also statistically consistent with 0 effect.

So, putting it all together, what do we have? I, at least, am convinced that increased exposure leads to increased risk for concentrations above 4 pCi/L. There’s some shaky empirical evidence for a weak protective effect at 2pCi/L compared to 0 pCi/L. In between it’s hard to say. All of the evidence below about 8 or 10 pCi/L is pretty shaky due to low expected risk, methodological problems with the studies, etc.

My informed belief is this: just as I wouldn’t suggest smoking a little bit of tobacco every day in the hope of a hormetic effect, I woudn’t recommend a bit of exposure to high-LET radiation every day. It’s not that it couldn’t possibly be protective, but I wouldn’t bet on it. And I’m pretty sure the EPA’s recommended ‘action level’ of 4 pCi/L is indeed risky compared to lower concentrations, especially for smokers. As a nonsmoker I wouldn’t necessarily remediate if my home were at 4 pCi/L, but I would at least consider it.

For low-LET radiation, I think the scientific evidence weighs against LNT. If public health agencies don’t take LNT seriously for this type of radiation it’s possible that they acknowledge this.

For high-LET radiation, such as alpha particles from radon decay products, there’s more a priori reason to believe LNT would be a good model, and less empirical evidence suggesting that it is a bad model. It might be hard for the agencies to explicitly disavow LNT in these circumstances. At the same time, there’s not compelling evidence in favor of LNT even for this type of radiation, and life is a lot simpler if you don’t take LNT ‘seriously’.

“Service” is one of my duties as a professor—the three parts of this job are teaching, research, and service—and, I guess, in general, those of us who’ve had the benefit of higher education have some sort of duty to share our knowledge when possible. So I have no problem answering reporters’ questions. But reporters have space constraints: you can send a reporter a long email or talk on the phone for an hour, and you’ll be lucky if one sentence of your hard-earned wisdom makes its way into the news article. So much effort all gone! It’s good to be able to post here and reach some more people.

Research project in London and Chicago to develop and fit hierarchical models for development economics in Stan!

Rachael Meager at the London School of Economics and Dean Karlan at Northwestern University write:

We are seeking a Research Assistant skilled in R programming and the production of R packages.

The successful applicant will have experience creating R packages accessible on github or CRAN, and ideally will have experience working with Rstan. The main goal of this position is to create R packages to allow users to run Rstan models for Bayesian hierarchical evidence aggregation, including models on CDFs, sets of quantiles, and tailored parametric aggregation models as in Meager (2016) (https://bitbucket.org/rmeager/aggregating-distributional-treatment-effects/src). Ideally the resulting package will keep RStan “under the hood” with minimal user input, as in rstanarm. Further work on this project is likely to involve programming functions to allow users to run models to predict individual heterogeneity in treatment effects conditional on covariates in a hierarchical setting, potentially using ensemble methods such as Bayesian additive regression trees. Part of this work may involve developing and testing new statistical methodology. We aim to create comprehensively-tested packages that alert users when the underlying routines may have performed poorly. The application of our project is situated in development economics with a focus on understanding the potentially heterogeneous effects of the BRAC Graduation program to alleviate poverty.

The ideal candidate will have the right to work in the UK, and able to make weekly trips to London to meet with the research team. However a remote arrangement may be possible for the right candidate, and for those without the right to work in the UK, hiring can be done through Northwestern University. The start date is flexible but we aim to hire by the middle of March 2018.

The ideal candidate would commit 20-30 hours a week on a contracting basis, although the exact commitment is flexible. Pay rate is negotiable and commensurate with academic research positions and the candidate’s experience. Formally, the position is on a casual basis, but our working arrangement is also flexible with the option to work a number of hours corresponding to full-time, part time or casual. A 6-12 month commitment is ideal, with the option to extend pending satisfactory performance and funding availability, but a shorter commitment can be negotiated. Applications will be evaluated beginning immediately until the position is filled.

Please send your applications via email, attaching your CV and the links to any relevant packages or repositories, with the subject line “R programmer job” to rachael.meager@gmail.com and cc sstephen@poverty-action.org and ifalomir@poverty-action.org.

Use multilevel modeling to correct for the “winner’s curse” arising from selection of successful experimental results

John Snow writes:

I came across this blog by Milan Shen recently and thought you might find it interesting.

A couple of things jumped out at me. It seemed like the so-called ‘Winner’s Curse’ is just another way of describing the statistical significance filter. It also doesn’t look like their correction method is very effective. I’d be very curious to hear your take on this work, especially this idea of a ‘Winner’s Curse’. I suspect the airbnb team could benefit from reading some of your work when it comes to dealing with these problems!

My reply: Yes, indeed I’ve used the term “winner’s curse” in this context. Others have used the term too.

Also here.

Here’s a paper discussing the bias.

I think the right thing to do is fit a multilevel model.

The Lab Where It Happens

“Study 1 was planned in 2007, but it was conducted in the Spring of 2008 shortly after the first author was asked to take a 15-month leave-of-absence to be the Executive Director for USDA’s Center for Nutrition Policy and Promotion in Washington DC. . . . The manuscript describing this pair of studies did not end up being drafted until about three years after the data for Study 1 had been collected. At this point, the lab manager, post-doctoral student, and research assistants involved in the data collection for this study had moved away. The portion of the data file that we used for the study had the name of each student and the location where their data was collected but not their age. Four factors led us to wrongly assume that the students in Study 1 must have been elementary school students . . .

The conclusions of both studies and the conclusions of the paper remain strong after correcting for these errors.”

— Brian Wansink, David R. Just, Collin R. Payne, Matthew Z. Klinger, Preventive Medicine (2018).

This reminds me of a song . . .

Ah, Mister Editor
Mister Prof, sir
Did’ya hear the news about good old Professor Stapel
No
You know Lacour Street
Yeah
They renamed it after him, the Stapel legacy is secure
Sure
And all he had to do was lie
That’s a lot less work
We oughta give it a try
Ha
Now how’re you gonna get your experiment through
I guess I’m gonna fin’ly have to listen to you
Really
Measure less, claim more
Ha
Do whatever it takes to get my manuscript on the floor
Now, Reviewers 1 and 2 are merciless
Well, hate the data, love the finding
Food and Brand
I’m sorry Prof, I’ve gotta go
But
Decisions are happening over dinner
Two professors and an immigrant walk into a restaurant
Diametric’ly opposed, foes
They emerge with a compromise, having opened doors that were
Previously closed
Bros
The immigrant emerges with unprecedented citation power
A system he can shape however he wants
The professors emerge in the university
And here’s the pièce de résistance
No one else was in
The lab where it happened
The lab where it happened
The lab where it happened
No one else was in
The lab where it happened
The lab where it happened
The lab where it happened
No one really knows how the game is played
The art of the trade
How the sausage gets made
We just assume that it happens
But no one else is in
The lab where it happens
Prof claims
He was in Washington offices one day
In distress ‘n disarray
The Uni claims
His students said
I’ve nowhere else to turn
And basic’ly begged me to join the fray
Student claims
I approached the P.I. and said
I know you have the data, but I’ll tell you what to say
Professor claims
Well, I arranged the meeting
I arranged the menu, the venue, the seating
But
No one else was in
The lab where it happened
The lab where it happened
The lab where it happened
No one else was in
The lab where it happened
The lab where it happened
The lab where it happened
No one really knows how the
Journals get to yes
The pieces that are sacrificed in
Ev’ry game of chess
We just assume that it happens
But no one else is in
The room where it happens
Meanwhile
Scientists are grappling with the fact that not ev’ry issue can be settled by committee
Meanwhile
Journal is fighting over where to put the retraction
It isn’t pretty
Then pizza-man approaches with a dinner and invite
And postdoc responds with well-trained insight
Maybe we can solve one problem with another and win a victory for the researchers, in other words
Oh ho
A quid pro quo
I suppose
Wouldn’t you like to work a little closer to home
Actually, I would
Well, I propose the lunchroom
And you’ll provide him his grants
Well, we’ll see how it goes
Let’s go
No
One else was in
The lab where it happened
The lab where it happened
The lab where it happened
No one else was in
The lab where it happened
The lab where it happened
The lab where it happened
My data
In data we trust
But we’ll never really know what got discussed
Click-boom then it happened
And no one else was in the room where it happened
Professor of nutrition
What did they say to you to get you to sell your theory down the river
Professor of nutrition
Did the editor know about the dinner
Was there citation index pressure to deliver
All the coauthors
Or did you know, even then, it doesn’t matter
Who ate the carrots
‘Cause we’ll have the journals
We’re in the same spot
You got more than you gave
And I wanted what I got
When you got skin in the game, you stay in the game
But you don’t get a win unless you play in the game
Oh, you get love for it, you get hate for it
You get nothing if you
Wait for it, wait for it, wait
God help and forgive me
I wanna build
Something that’s gonna
Outlive me
What do you want, Prof
What do you want, Prof
If you stand for nothing
Prof, then what do you fall for
I
Wanna be in
The lab where it happens
The lab where it happens
I
Wanna be in
The lab where it happens
The lab where it happens
I
Wanna be
In the lab where it happens
I
I wanna be in the lab
Oh
Oh
I wanna be in
The lab where it happens
The lab where it happens
The lab where it happens
I wanna be in the lab
Where it happens
The lab where it happens
The lab where it happen
The art of the compromise
Hold your nose and close your eyes
We want our leaders to save the day
But we don’t get a say in what they trade away
We dream of a brand new start
But we dream in the dark for the most part
Dark as a scab where it happens
I’ve got to be in
The lab (where it happens)
I’ve got to be (the lab where it happens)
I’ve got to be (the lab where it happens)
Oh, I’ve got to be in
The lab where it happens
I’ve got to be, I’ve gotta be, I’ve gotta be
In the lab
Click bab

(Apologies to Lin-Manuel Miranda. Any resemblance to persons living or dead is entirely coincidental.)

P.S. Yes, these stories are funny—the missing carrots and all the rest—but they’re also just so sad, to think that this is what our scientific establishment has come to. I take no joy from these events. We laugh because, after awhile, we get tired of screaming.

I just wish Veronica Geng were still around to write about these hilarious/horrible stories. I just can’t give them justice.

One data pattern, many interpretations

David Pittelli points us to this paper: “When Is Higher Neuroticism Protective Against Death? Findings From UK Biobank,” and writes:

They come to a rather absurd conclusion, in my opinion, which is that neuroticism is protective if, and only if, you say you are in bad health, overlooking the probability that neuroticism instead makes you pessimistic when describing your health.

Here’s the abstract of the article, by Catharine Gale, Iva Cukic, G. David Batty, Andrew McIntosh, Alexander Weiss, and Ian Deary:

We examined the association between neuroticism and mortality in a sample of 321,456 people from UK Biobank and explored the influence of self-rated health on this relationship. After adjustment for age and sex, a 1-SD increment in neuroticism was associated with a 6% increase in all-cause mortality (hazard ratio = 1.06, 95% confidence interval = [1.03, 1.09]). After adjustment for other covariates, and, in particular, self-rated health, higher neuroticism was associated with an 8% reduction in all-cause mortality (hazard ratio = 0.92, 95% confidence interval = [0.89, 0.95]), as well as with reductions in mortality from cancer, cardiovascular disease, and respiratory disease, but not external causes. Further analyses revealed that higher neuroticism was associated with lower mortality only in those people with fair or poor self-rated health, and that higher scores on a facet of neuroticism related to worry and vulnerability were associated with lower mortality. Research into associations between personality facets and mortality may elucidate mechanisms underlying neuroticism’s covert protection against death.

The abstract is admirably modest in its claims; still, Pittelli’s criticism seems reasonable to me. I’m generally suspicious of reading too much into this sort of interaction in observational data. The trouble is that there are so many possible theories floating around, so many ways of explaining a pattern in data. I think it’s a good thing that the Gale et al. paper was published: they found a pattern in data and others can work to understand it.

Testing Seth Roberts’ appetite theory

Jonathan Tupper writes:

My organization is running a group test of Seth Roberts’ old theory about appetite.

We are running something like a “web trial” as discussed in your Chance article with Seth. And in fact our design was very inspired by your conversation… For one, we are using a control group which takes light olive oil *with* meals as you mentioned. We are also testing the mechanism of hunger rather than the outcome of weight loss. This is partly for pragmatic reasons about the variability of the measures, but it’s also an attempt to address the concern you raised that the mechanism is the 2 hour flavorless window itself. Not eating for two hours probably predicts weight loss but it wouldn’t seem to predict less hunger!

Here’s how to sign up for their experiment. I told Tupper that I found the documentation at that webpage to be confusing, so they also prepared this short document summarizing their plan.

I know nothing about these people but I like the idea of testing Seth’s diet, so I’m sharing this with you. (And I’m posting it now rather than setting it at the end of the queue so they can get their experimental data sooner rather than later.) Feel free to post your questions/criticisms/objections/thoughts in comments.

3 quick tricks to get into the data science/analytics field

John McCool writes:

Do you have advice getting into the data science/analytics field? I just graduated with a B.S. in environmental science and a statistics minor and am currently interning at a university. I enjoy working with datasets from sports to transportation and doing historical analysis and predictive modeling.

My quick advice is to avoid interning at a university as I think you’ll learn more by working in the so-called real world. If you work at a company and are doing analytics, try to do your best work, don’t just be satisfied with getting the job done, and if you’re lucky you’ll interact with enough people that you’ll find the ones who you like, who you can work with. You can also go to local tech meetups to stay exposed to new ideas.

But maybe my advice is terrible, I have no idea, so others should feel free to share your advice and experience in the comments.