Now suppose you have an independent prior for all 100 coefficients, let’s say independent normal(0, 2) for each coefficient for concreteness.

Any pair of those coefficients may be correlated in the posterior. The slope for two different states may be correlated, the slope and intercept for a single state may be correlated, etc.

If you take the posterior means for the 50 intercepts and the 50 slopes, it’s possible those two vectors have non-zero sample correlation in the posterior (in fact it’s almost certain given how it’s calculated).

]]>I think Charles is imagining a hierarchical model in which there is a group of pairs of parameters drawn from a hierarchical prior. In that case, we can look at the sample correlation among the draws. This isn’t the same as looking at posterior correlation. What Charles says is right in the hierarchical setting in that a hierarchical prior that favors positive correlation will also cause the estimates of the lower-level parameters to be more correlated.

]]>My understanding is that if one does not put a prior on the co-variation of two parameters, one is making “no assumptions” about their correlation, but one is not making the assumption that “their correlation in zero”.

For example, if one models regression weights without also modeling their covariance, one is not necessarily assuming that they are not correlated. ]]>

You can see the coding deets in the package vignettes, which are now on CRAN: https://cran.r-project.org/web/packages/idealstan/index.html.

]]>http://discourse.mc-stan.org/t/latent-factor-loadings/1483

The bottom includes a Stan solution (“test1.stan”) that would allow for both positive and negative discriminations, while also maintaining parameter identification. I personally prefer that solution over because it allows discrimination parameters’ posteriors to overlap with 0, which may be needed during scale development (when some useless items might be considered).

]]>I could not get a model with unconstrained discriminations to work back when I started on edstan because the chains would settle in different local maxima. Because I mainly work in an education context, this didn’t seem like the most important problem to solve, and so I let it go. I look forward to seeing how Bob coded up a solution.

]]>Thanks for sharing your thoughts, very interesting points!

]]>This believe is empirically unfounded for real life educational tests. Just look at “Figura 3” (never mind the Portuguese text) here https://arxiv.org/pdf/1802.09880.pdf which is the characteristic curve for one question of a national Brazilian higher education entrance exam.

]]>Also, my experience at ETS lead me to believe that for well-written cognitive items have discriminations between .5 and 2, so I frequently use a N(0,sqrt(2)/2) prior for discrimination. This assumes that the items have been reviewed by competent reviewers to make sure they are in fact discriminating.

This is entirely application dependent. Attitude scales and other psychological constructs can have reverse keyed items which have negative discriminations.

]]>I’d prefer to put the normal modal on alpha, and then if the data isn’t consistent with negative-discrimination items, that should show up in the inferences. I’d rather have the data reveal this than impose it from the outside.

I was having lunch with Keith O’Rourke the other day and a similar point came up in conversation. The question was one of constraining an intercept to be zero in a simple linear regression on theoretical grounds. The issue is that if the data are such that the constrained fit is very far from the unconstrained fit then it’s likely that the constraint is false and all of the supposed gain from imposing the constraint (in either improved parameter estimates or predictions) will actually be a distortion of the information in the data. By setting the intercept on theoretical grounds we are robbing ourselves of one way to check our assumptions and potentially degrading the output of the inference to boot.

]]>My paper at StanCon in January gets into why discriminations are constrained to be positive in traditional IRT. It depends on the relationship between the data and the unobserved latent variable – if we believe that the relationship is monotonically increasing, then it makes sense to constrain alpha to be positive. For example, on a test if you get a question right it *always* signals higher ability as a student. But in other situations, such as in the ideal point model, that may not be the case – voting “yes” on a bill could signal you are higher or lower on the latent scale (conservative/liberal) because it depends on how the bill loads in the ideal point space. For that reason, in the ideal point model the discrimination parameters should be left unconstrained.

In some situations, it may be possible to fit either model. For example, in the paper I look at an Amazon food ratings database (1 to 5 scale) of coffee products. If I constrained alpha, the model would be interpreted as which coffee product receives the best ratings/has the highest ability. But if I don’t, which I don’t in the paper, then the model becomes about which coffee products tend to be most polarizing between raters.

It took me a long time to figure this out, so I wanted to share. The paper should be up on the Stan website along with the other conference papers soon.

]]>