Stan Puzzle 2: Distance Matrix Parameters

Posted on November 20, 2015 3:00 PM by Bob Carpenter

This puzzle comes in three parts. There are some hints at the end.

Part I: Constrained Parameter Definition

Define a Stan program with a transformed matrix parameter d that is constrained to be a K by K distance matrix. Recall that a distance matrix must satisfy the definition of a metric for all i, j:

* positivity : d[i,j] >= 0

* self distance : d[i,j] = 0 iff i = j

* symmetry : d[i,j] = d[j,i]

* triangle inequality : d[i,j] <= d[i,k] + d[k,j] for all k

Part II: Modeling Assumptions

Now suppose there are noisy measurements y[n] of the distance between points ii[n] and jj[n]. This corresponds to a Stan program with the following data block.

data {
  int K;  // num dims
  int N;  // num observations
  int ii[N];  // first point of observation n
  int jj[N];  // second point of observation
  real y[N];  // distance measurements (could bound below by 0)
}

A likelihood that assumes independent normal noise on the measurements is defined as follows.

model {
  for (n in 1:N)
    y[n] ~ normal(d[ii[n], jj[n]], sigma);
}

This assumes a positive-constrained parameter sigma for the noise scale.

parameter {
  real sigma;  // measurement noise scale
  ...

Feel free to give it a prior taking into account the distance scales involved in the problem and the measurement process.

Then there are then (K choose 2) free parameters, which may be (a) left unconstrained, (b) constrained to be positive, or (c) transformed to a proper distance matrix satisfying the distance metric conditions. What is the effect on the posterior of these three modeling assumptions?

Part III: Complexity

What’s the complexity of evaluating the likelihood with and without the distance-matrix constraint on the parameters?

Hints

Hint 1 (I) : There should be (K choose 2) [i.e., K * (K – 1) / 2] “raw” parameters (either unconstrained or constrained to be positive) which then need to be scaled.

Hint 2 (I) : See the manual chapter “Transformation of Constrained Variables” and in particular the section on lower and upper bounded variables if you want hints on the Jacobians necessary.

Hint 3 (I) : Order the transform so the Jacobian’s easy to compute.

Hint 4 (II) : The only way I know how to do this is by simulation, but a bunch of the other Stan developers probably just know the answer.

Hint 5 (III) : The answer should be should be an expression involving N and K, such as $latex {\mathcal O}(f(N,K))$ for some function $latex f$.

Hint 6 (III) : Count the number of expressions that get evaluated in the transform and in the model block.

4 thoughts on “Stan Puzzle 2: Distance Matrix Parameters”

Eli Rabett on November 20, 2015 10:02 PM at 10:02 pm said:

Strikes Eli this is not too far from Doug Keenan’s challenge to separate series with deterministic trends and noise from those without. Given Keenan’s poorly defined statement of the problem, it probably would not be possible to completely solve, but Baysean methods may at least narrow the field enough that one of multiple submissions would win the $100,000 prize (there is a $10 entry fee)

Reply ↓
Dave on November 21, 2015 4:16 AM at 4:16 am said:

You have a typo in the triangle inequality which should be something like:

Triangle inequality : d[i,j] <= d[i,k] + d[k,j] for all I, j, k

Reply ↓
- Bob Carpenter on November 21, 2015 8:54 PM at 8:54 pm said:
  
  Thanks — I fixed it in the body of the post.
  
  Reply ↓
Thomas on May 18, 2016 11:13 AM at 11:13 am said:

Is there a solution of this puzzle? How does one efficiently implement the constraints on d?

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Stan Puzzle 2: Distance Matrix Parameters

4 thoughts on “Stan Puzzle 2: Distance Matrix Parameters”

Leave a Reply Cancel reply