Greg Campbell writes:

I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear).

Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of all the dimensions is the response (a Model II regression), and the fitted regression line must give equal weight to all the dimensions: common methods are major-axis (perpendicular distances between the line and all the points are minimised, in a principal-component-analysis way) and reduced major axis or standard-major-axis (perpendicular distances between the standardised points and the line are fitted, and then unstandardised).I see that you literally wrote the book on regression. Do you know if it is possible to carry out major-axis or reduced-major-axis fitting in multiple linear regressions in SPSS, SAS or Systat (I know that it can’t be done in Minitab)?

Do you know if there are applications in R that carry out this type of analysis?

My reply: I’m a sucker for any email that begins, “I am a Canadian archaeologist.” I think there are various models out there that could work here, including factor analysis and measurement-error models. I’m no expert on this particular set of models, but they get used in psychometrics when there are many variable measurements. Maybe some commenters could help?

I never used this (or tested this), but I did some exploring and found in a recent spssx discussion thread that you can use reduced major axis regression through the constrained nonlinear regression command (CNLR). See here about halfway down: http://spssx-discussion.1045642.n5.nabble.com/Model-II-regression-analysis-td4286435.html

Another commenter brought up an interesting approach that I think (if I understand your problem correctly) translates well here–why not consider SEM? That would be my first approach.

And here are the commands for both major axis regression and reduced major axis regression, http://www.listserv.uga.edu/cgi-bin/wa?A2=ind9603&L=spssx-l&F=&S=&P=17532

Finally Jack Weiss mentions an algebraic alternative to OLS when you have ecological data http://www.unc.edu/courses/2007spring/biol/145/001/docs/lectures/Nov3.html#maxlikelihoodform. Not sure if it’s useful, but the background in the lecture notes helped me understand this problem more fully anyway

Hi Andrew,

I’m a somewhat longtime lurked (via Deborah Mayo’s blog) but have never commented before. Anyway, seems like a latent class model might be just the ticket, although I don’t think that they are available in SAS, SPSS, or Minitab… I typically use MPlus to model these, but Stata (GLLAMM…mmmm I forget the acronym) might also work.

One can be Bayesian about this: instead of fitting a p(y|x), one just needs to fit a p(x,y). The main issue is that by also fitting a distribution for p(x|y), the predictive quality of p(y|x) won’t be as good with the same number of parameters. But Lasserre’s paper “Principled Hybrids of Generative and Discriminative Models (2006)” talks about interpolating between both extremes.

All the improvements coming out of Bayesian priors and more complex models can be applied to this setting – rather than going to OLS.

The R package SMATR, (Standardised) Major Axis Estimation and Testing Routines.

I think you can analyze your data using HMLM (hierarchical multi-variate linear models). You stack up all the data in the outcome column and include indicators on the right side of the equation specifying the type of variable. The program calculates latent random variables for each of the variables, and the covariances of the random variables show the relationships among the variables.

This sounds like a problem in morphometrics. This website looks to have software and tutorials. http://life.bio.sunysb.edu/morph/index.html

Along the lines of Aleks’s earlier comment, factor-analysis type models have been used as a joint model to induce a reduced rank (uni- or multivariate) regression in a fully Bayesian way (see e.g. http://ftp.isds.duke.edu/WorkingPapers/02-12.html). For R packages, MCMCpack has factor analysis functions that will give the necessary output (after some post-processing), as does the bfa package. bfa includes shrinkage priors for factor loadings which may be of interest.

This feels like a community ecology problem where it’s called ‘direct gradient analysis’ and is a subset of ordination methods. (Indirect gradient analysis is when you don’t have covariates for the measurements you’re treating as a dependent variable.) Actually, ordination is also used for dating in archaeology, where the unobserved ‘gradient’ that summarises the vector of measurements is time, so you might have come across it there. One of the old-style, but still widely used methods for direct gradient analysis is canonical correspondence analysis. As for references, http://ordination.okstate.edu/overview.htm is a reasonable orienting introduction, and ter Braak 1988 provides a quick review of the basic methods. Things have moved on on the estimation and modeling front, but this should give you the basic toolset. The R package vegan implements most methods.

If he chooses to go the latent class analysis route, to me the best book remains Ronald Hayduks’s. The inner leaf of the book cover alone is worth looking at from your library because it diagrams out all the various matrices and relationship in the Lisrel model.

The ‘psych’ package in R works well if you want to do exploratory factor analysis [function: fa()] or principal component analysis [function: principal()]. ‘GPArotation’ has a host of rotation algorithms that may be of interest to improve interpretability.

Latent Class Analysis can be done in SAS…check the Methodology Center at Penn State. They have a great SAS add-on for LCA.