An undergraduate econ student asks about how to learn Bayesian statistics

Matt Stephenson writes:

I am currently an undergraduate student in economics . . . In order to better facilitate graduate study in Bayesianism next year, I’m arranging an independent study on Bayesian econometrics with a (frequentist) professor, and I am expected to set up the course. Would you mind if I asked you a few questions about directions to take?

1.a. If my likely use of Bayesian stats is econometrics, do you think it’d be good to use an “Intro to Bayesian Econometrics” textbook (like Lancaster’s) or a general introduction textbook like your “Bayesian Data Analysis.”

1.b. If you think the latter (and I’m inclined to think a general introduction would be better) would you recommend supplementing your textbook with any other book, perhaps one on BUGS, or another general textbook? I’m not trying to put you in the difficult spot here of either listing the shortcomings of the textbook or sounding over-confident. My question really comes from the fact that I won’t have a teacher to bolster my understanding.

2. You’ve mentioned a few times that “to do statistical research [now]… you have to be a computer programmer.” Are there any programming languages I could supplement my “course” with to better prepare me?

3. As a side note, I recently read your older review of Axelrod and the misapplication of the “prisoner’s dilemma” model. Is it a coincidence that such a critique came from a Bayesian? It seems indeed that thinking hard about the model with which one is working, and said model’s applicability to the subject, is one of the great strengths of applied Bayesianism.

My reply:

1. I like Tony Lancaster’s book, and, of course, I like my own. I think either book would be fine, if your advisor is comfortable with it. If you want stuff on Bugs, I’d recommend looking into my book with Hill.

2. R or Matlab for statistics, But Stata is what’s popular in economics. And then of course there are C and Python. I think the best way to learn a language is to have to learn it; that is, to have a problem that you need to program to solve. Fortunately (or unfortunately), there are a lot of problems like that. Just about any applied statistics problem requires programming if you want to do it right.

3. I did the prisoner’s dilemma stuff as a senior in college, before I knew much about Bayesian statistics. I certainly didn’t perceive any connection at the time. (And, back then, people dind’t casually throw around the term “Bayesian” as a synonym for “rationality” the way they do today.) But perhaps the same seriousness-about-models that inspired me to criticize Axelrod also made me sympathetic to Donald Rubin’s approach to Bayesian statistics.

12 thoughts on “An undergraduate econ student asks about how to learn Bayesian statistics

  1. Answer for 2:

    As a current student in an Econ PhD program, I can confirm that MATLAB is currently the most popular instruction software in most graduate level econometrics courses at most north american universities. Matlab is also used in graduate finance and macroeconomics courses. So, there is no way around having to learn Matlab, and ideally learning it well. (An amusing story: one day some students came late to class and said they don't have their homework ready because their bootstrap simulation is still running in Matlab, and was running for hours. Well, if they did it the right way, it could take only 10-20 minutes).

    Once people start doing actual empirical work (not homeworks), they generally prefer to use SAS or STATA. Almost no one uses R.

    People who work in econometric methods or anything that does not have a canned routine in SAS or STATA, continue using Matlab or switch to another matrix language like GAUSS or Ox. Almost no one uses R. I heard there are some good Bayesian packages for R though.

  2. Jacob: I don't see that being a student at one place necessarily tells you what is going at (presumably) several hundred others. What are your data on MATLAB use, precisely? (If these are your impressions, perhaps others have different impressions.)

    This posting implies that Stata isn't programmable, very far from being the case. For example, since Stata 9 it has included a matrix-oriented, C-like language called Mata. Conversely, if you want bootstrapping in Stata, you don't need to write a program, even.

    N.B. a footnote on upper/lower case: I understand that "MATLAB" is still the official name. I know that "Stata" has been the name used over almost all of that program's history.

  3. A comment about the R vs Matlab thing. I used Matlab as a doctoral student, mainly because it's what my professors and advisors used. But as I started to delve deeper into Bayesian statistics, I realized that there were so many R packages available for Bayesian stats that I made the switch to R. Fortunately, the basic programming concepts for R and Matlab are the same (interpretive, matrix-based, etc), so aside from differences in syntax, making the leap from Matlab to R was a piece of cake. My point is that if you find yourself learning Matlab because of good "institutional" reasons, don't feel like it's something you are locked into for life (and you may never even need to make the switch).

    Another advantage to R is that it's easy to write your own modules in C, and pass data and results back and forth from R. I write my MCMC algorithms in C (for speed), but process data and analyze results in R using existing packages. I don't know if that's easy to do in Matlab.

  4. A comment about the R vs Matlab thing. I used Matlab as a doctoral student, mainly because it’s what my professors and advisors used. But as I started to delve deeper into Bayesian statistics, I realized that there were so many R packages available for Bayesian stats that I made the switch to R. Fortunately, the basic programming concepts for R and Matlab are the same (interpretive, matrix-based, etc), so aside from differences in syntax, making the leap from Matlab to R was a piece of cake. My point is that if you find yourself learning Matlab because of good “institutional” reasons, don’t feel like it’s something you are locked into for life (and you may never even need to make the switch).

    Another advantage to R is that it’s easy to write your own modules in C, and pass data and results back and forth from R. I write my MCMC algorithms in C (for speed), but process data and analyze results in R using existing packages. I don’t know if that’s easy to do in Matlab.

  5. I very strongly disagree with both "they generally prefer to use SAS or STATA. Almost no one uses R" and "MATLAB is currently the most popular instruction software in most graduate level econometrics courses at most north american universities".

    Stata is definitely the dominant software in econometrics courses when dealing with cross-sectional data. I have never heard of anyone using Stata for a time series econometrics course. SAS used to be popular, but it has lost many users to Stata.

    Matlab is the dominant tool for the evaluation of theoretical macro models. I have seen many more econometrics courses use GAUSS than Matlab.

    Finally, R is rapidly becoming popular in econometrics. Just look at all the econometrics packages that are available! There's even a CRAN task view for econometrics. If I were learning Bayesian econometrics, I would definitely go with R, given the existing packages. Sharing of software in Matlab, to put it in the best possible terms, is a mess.

    The best software to use is the one that does the job. Nobody really cares what you use as long as your results are correct.

    Also, check out Koop (2003) for an introduction to Bayesian econometrics, in addition to the books already mentioned.

  6. Is the "almost no one uses R" statement specific to economics? GAUSS used to be the big thing in political science, but as far as I can tell its been eclipsed by R. I can confirm that the economists I know are unfamiliar with R.

  7. Nick,

    Of course, my comment on MATLAB's popularity is based on my personal opinion and not some kind of scientific study of North American economics departments. So, why did I claim what I claimed?

    I have interacted with junior faculty who come from all over the place (Berkeley, UCLA, midwest universities, public and private, etc). Almost everyone who needs a matrix language is using MATLAB. Some people use GAUSS or Ox instead. Those who need canned routines use Stata or SAS (I never checked which one is more popular). I have never met a professor or grad student who is using R for econometrics, although I am sure they exist somewhere out there. Occasionally, people familiar with R say that they still prefer MATLAB for speed. Published journal articles, specially in time series econometrics, often cite the package that was used. Again, the usual suspects are MATLAB, GAUSS, and Ox. I have never seen an article that claims to have used R. Again, they may exist out there. Yes, I have heard that SAS and Stata have their own matrix programming facility, but I personally haven't heard of many economists using it, much less using it for instruction.

    I also meant to comment on the languages of instruction. MATLAB is almost always used for teaching basic econometrics. Advanced seminars might use other tools though (Stata, SAS, your choice). You can also try this. Searching Google for "MATLAB econometrics syllabus" renders 37000 hits. Searching for "r-project econometrics syllabus" returns 2130 hits. So, but by now the situation with software choice in academic economics should be pretty clear.

  8. BTW, in no way I am putting down R. R is great. The point of my post was. If you're heading into grad school to study economics and you want to learn ONE tool before the instruction starts, learn MATLAB. That's it. It's actually often an inconvenient tool to use for many practical tasks, but practicing macroeconomists, finance people, and econometricians still use it a lot.

  9. Jacob: Thanks for your extra comments. You'd have saved yourself from some misunderstanding and some flak by making it clearer at the outset that you were arguing from impressions.

    "Confirm" "most" "most" "most" are words that to me implied that you were arguing securely from some survey data or something else more solid.

Comments are closed.