Jeremy Neufeld writes:

I’m an undergraduate student at the University of Maryland and I was recently referred to this paper (Vine Regression, by Roger Cooke, Harry Joe, and Bo Chang), also an accompanying summary blog post by the main author) as potentially useful in policy analysis. With the big claims it makes, I am not sure if it passes the sniff test. Do you know anything about vine regression? How would it avoid overfitting?

My reply: Hey, as a former University of Maryland student myself I’ll definitely respond! I looked at the paper, and it seems to be presenting a class of multivariate models, a method for fitting the models to data, and some summaries. The model itself appears to be a mixture of multivariate normals of different dimensions, fit to the covariance matrix of a rank transformation of the raw data—I think they’re ranking each variable on its marginal distribution but I’m not completely sure, and I’m not quite sure how they deal with discreteness in the data. Then somehow they’re transforming back to the original space of the data; maybe they do some interpolation to get continuous values, also I’m not quite sure what happens when they extrapolate to beyond the range of the original ranks.

The interesting part of the model is the mixture of submodels of different dimensions. I’m generally suspicious of such approaches, as continuous smoothing is more to my taste. That said, the usual multivariate models we fit are so oversimplified, that I could well imagine that this mixture model could do well. So I’m supportive of the approach. I think maybe they could fit their model in Stan—if so, that would probably make the computation less of a hassle for them.

The one think I really don’t understand at all in this paper is their treatment of causal inference. The model is entirely associational—that’s fine, I love descriptive data analysis!—and they’re fitting a multivariate model to some observational data. But then in section 3.1 of their paper they use explicit causal language: “the effect of breast feeding on IQ . . . If we change the BFW for an individual, how might that affect the individual’s IQ?” The funny thing is, right after that they again remind the reader that this is just descriptive statistics “we integrate the scaled difference of two regression functions which differ only in that one has weeks more breast feeding than the other” but then they snap right back to the causal language. So that part just baffles me. They have a complicated, flexible tool for data description but for some reason they then seem to make the tyro mistake of giving a causal interpretation to regression coefficients fit to observational data. That’s not really so important, though; I think you can ignore the causal statements and the method could still be useful. It seems worth trying out.

I wouldn’t really consider it an undergraduate topic. The student should get familiar copulas and vine copulas first.

I know of vines from vine copulas. Copulas, in general, are a way to focus solely on the dependence structure of a multivariate distribution. Most significantly, copulas are useful when the dependence structure is non-normal. For instance, if the S&P 500 decline is in the 1st percentile, will other stocks experience declines worse than that (on a percentile basis). The earlier work focused on parametric distributions, such as the normal copula or the t copula. But what if the degrees of freedom of a t copula is different between different parts of your distribution (consider a distribution including returns of several stock indices plus commodity indices)? Even more generally, what if some pairs of variables have a t copula and another set of pairs has a Gumbel copula? Vine copula are a flexible way to handle situations like this.

It is possible to do copula regression. This would allow the researcher to separate the marginals of the distributions from the dependence structure between the independent and dependent variables. Here is a good presentation on the topic:

http://www-1.ms.ut.ee/tartu07/presentations/kolev.pdf

I would only use this approach if I had a really good reason to.

As such, vine copulas are an obvious extension. They would allow greater flexibility in fitting models.

Guess I was just taken aback with the author’s description that the approach “promises to dispel” questions like “What about ‘multi-collinearity’? transformations? bias? convergence? efficiency?”

Jeremy:

Yes, I was just ignoring the tacky hype.

The causal inference confusion gets even worse at the linked blog post, where the author explicitly invokes counterfactual-type language:

“Once we have a joint density we can compute everything. We can compute the expected IQ of a child who was breast fed for 2 weeks, born in 1992 to a 29 year old mother who completed the 12th grade, with an IQ of 100.13 and a yearly family income of $21,500. That child’s expected IQ is 94.6. Had that same child been breast fed for 12 weeks, its expected IQ would be 96.1, and for 22 weeks, 96.8. “

Samuel:

Good catch: yes, that quote is horrible!

I personally like the idea that we can measure IQ to two decimal places, but income to only $500 increments. It highlights the fact that IQ is a real number, and income is just a concept.

Jrc:

The going rate is $500 per 0.01 point of IQ. You can trade in the income for IQ at any time except on certain restricted dates. It’s like frequent flyer miles.

When I was in elementary school two friends and I were buddies at the top of the class. One of us had access to the raw IQ test scores in the school office. She found out why our adjusted scores were linearly ordered when the raw scores were the same. Jim was born in April, Mary Ella in May; my birthday is in June. So I cam out smartest.

Enjoying this discussion immensely. Is Ethan Bolker THE Ethan Bolker? Didn’t you prove that two continuous probability measures with the same set of sets with probability 1/2 are the same measure? I’ve spent too much time in philosophy to get too worked up about causality. The point is, regression IMHO is just a poor man’s way of conditionalizing a joint distribution. WO the simplifying assumption (SA, that conditional copulae don’t depend on the values of the conditioning variables), vine copulae can represent every continuous distribution. I asked myself, what would I do if I had the true density from which the data were sampled? Even with SA we will have a very large and malleable class of multivariate distributions. Why not pick the one that best fits your multivariate data, work with that and exit the epicycles of regression? Is the SA-fit not good enough? SA can easily be relaxed without losing tractability and folks are working on that.NB, all this applies to continuous densities. That’s why I liked the breasfeeding dataset.

Roger:

You write, “I’ve spent too much time in philosophy to get too worked up about causality.”

Ummm, I don’t buy that alibi. Causal claims have real implications. This can be seen in simple examples of observational data with selection: Run a regression and you find that people who got the treatment did better than people who got the control, conditioning on the other predictors in your model. Then do an experiment and you find the treatment makes people worse. It’s easy to set up simulations where this behavior occurs, and apparently it happens in the real world too.

Regarding your “exit the epicycles” question: Linear regression with no interactions is a model. Linear reaction with interactions is a model. Your procedure is a model. All models have assumptions. Different assumptions will be appropriate in different settings. Where your method works well, that’s great; where it works less well, that will be in situations where is model is further from reality. When data are sparse, you can’t necessarily just pick the model that best fits your multivariate data; this can overfit and give you noisy predictions and poor estimates.

Ha Andrew

Thanks for your response and btw nice getting back in touch – you may remember that we worked together on a project by Jim Hammitt in 2001.

Re causality I agree with your model comments – its all model based. Hume pointed out that we never perceive necessity, only association. Wrt a given model the best definition I could give is something like this: Vbl X is a cause of vbl Y in model (X,Y,Z1…Zn) if \rho(X,Y)>>0 and for no A \subset Z1…Zn are X and Y conditionally independent given A. Of course there may be conditional independence for some values of A and not for others. I’m reminded of dry deposition, which is influenced on scales from the mean free path of Brownian motion to the mixing layer of the atmosphere. One guy identified 80 influencing vbls, any of which could be dominant depending on values of the others. What would a regression based causality claim be worth?

Yes all models are models and all models are wrong. But that does not discharge us from attempting validation and model criticism as best we can. In regression-land I miss the attempt to ground-truth their heuristics. Compare performance on datasets for which we know the true regression function. In the BF case, we build a density with the same margins as the data and a similar dependence structure, and compare results on this set for which we know the regression. The standard heuristics do not do stunningly well. IMHO this should be standard practice.