Solomon Hsiang writes:
Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges).
1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically.
2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background had so much visual weight. So to meet in the middle, I smoothed the spaghetti plot to get a nonparametric estimate of the probability that the conditional mean is at a given value (see “visually_weighted_fixed_ink_smoothed_spaghetti” attached). To do this, after generating the spaghetti through bootstrapping, I estimate a kernel density of the spaghetti in the Y dimension for each value of X. I set the visual-weighting scheme so it still “preserves ink” along a vertical line-integral, so the distribution dims where it widens since the ink is being “stretched out”. To me, it kind of looks like a watercolor painting — maybe we should call it a “watercolor regression” or something like that.
The watercolor regression turned out to be more of a coding challenge than I expected, because the bandwidth for the kernel smoothing has to adjust to the width of the CI. And since several people seem to like R better than Matlab, I attached 2 figs to show them how I did this. Once you have the bootstrapped spaghetti plot (step1.jpg), I defined a new coordinate system that spanned the range of bootstrapped estimates for each value in X (step2.jpg). The kernel smoothing is then executed along the vertical columns of this new coordinate system.
I’ve updated the code posted online to include this new option. This Matlab code will generate a similar plot using my vwregress function:
x = randn(100,1);
e = randn(100,1);
y = 2*x+x.^2+4*e;
bins = 200;
color = [.5 0 0];
resamples = 500;
bw = 0.8;
vwregress(x, y, bins, bw, resamples, color, 'SMOOTH');
This has been a really helpful/fun process. Thanks to you and your readers for all the feedback. I don’t think I’ll ever plot a simple/solid regression line again:)