Skip to content

F-f-f-fake data

Tiago Fragoso writes:

Suppose I fit a two stage regression model

Y = a + bx + e
a = cw + d + e1

I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). However, I could fit the first regression only using MCMC because those estimates are hard to obtain and perform the second regression using least squares or a separate MCMC.

So there’s an ‘one step’ inference based on doing it all at the same time and a ‘two step’ inference by fitting one and using the estimates on the further steps. What is gained or lost between both? Is anything done in this question?

My response:

Rather than answering your particular question, I’ll give you my generic answer, which is to simulate fake data from your model, then fit your model both ways and see how the results differ. Repeat the simulation a few thousand times and you can make all the statistical comparisons you like.


  1. Rahul says:

    Computers have made a statisticians life so much easier…. :)

  2. [...] See full story on [...]

  3. Mark Palko says:

    I’ve toyed with the idea of doing something like with a model using something like flexible like a decision tree. Use resampling to to create multiple trees on the real and fake data then compare the frequency and importance of different variables and combinations of variables.

    I’ve never gotten around to trying it but I always thought it would be interesting. Has anyone else out there tried real/fake analysis using flexible models?

  4. zbicyclist says:

    Excellent advice, widely applicable.

  5. o.k. says:

    This kind of a thing pops up very often in econometric applications. Googling “generated regressor” will give you a lot of material.