Zipfian Academy, A School for Data Science

Katie Kent writes:

I’m with Zipfian Academy – we’re launching next week as the first 12-week immersive program to teach data science. The program combines the hard and soft skills of data science with introductions to the data science community out here in San Francisco.

The launch will be covered by a couple big tech blogs, but we’d love to offer the opportunity to blog about it to some smaller and well-respected data science blogs like yours.

I don’t know anything about this but I took a look at the website and it looks pretty cool. Maybe in a future iteration of their course, they can teach Stan, once it has a few more useful features such as VB and EP.

10 thoughts on “Zipfian Academy, A School for Data Science

  1. Speaking of STAN, will there be any explanation of this in your book that’s being published this fall? Any other changes to expect in book? Just a few questions. A post on this may be more appropriate than a comment. Thanks.

  2. “What We Teach… Scrubbing… Data scientists spend ~80% of their time simply getting their data in the proper format. Cleaning data is a unavoidable but extremely important skill to know (and know well). ”

    So true. Yet that’s actually the first time I’ve heard an education/training organization call it out explicitly.

    • Somewhere I have a comment from a data analyst from the US gov’t saying that he assumed that 90% or more of his time and his graduate students time is spent cleaning data.

      My much more limited experience agrees with him. I was really impresed to find that one of my drug users could run a .025 minute mile.

      Of course the fact that I did have a person liviing in social housing while making $250K was also surprising[1]

      1. This last was true and was a function of Canadian social housing policy– he just liked where he lived and paid market rent.

  3. I wouldn’t mind so much that the instructors all look like they are twelve years old except that they are more than a little oblique about their credentials– are the degrees they list B.S., M.S., or Ph.D.s? It’s hard to believe that, say, a company or non-profit that wanted to train its employees in these skills is going to trust this place enough to pay the 14K for it. And an individual who pays for themselves is going to be taking a big risk if they don’t find a way to make the course a more reliable signal. And that’s assuming the classes are well-organized and well-taught.

    • My own experience in industry is that nobody really cares about whether you have a Ph.D. or are self taught. In fact, I experienced some degree of pushback for having a Ph.D. and having been a professor and worked at Bell Labs (I paraphrase, “we’ve had a hard time in the past with people with Bell-shaped heads,” from the VP of operations at SpeechWorks).

      Much more credibility accrues to having been involved in a fundamental way with successful projects.

      I was amused that the last three weeks of their twelve-week course outline was devoted to finding a job.

  4. Slightly off topic, but can someone either explain, or provide a link explaining, the connection between Stan and VB/EP? I understand there is a somewhat fundamental connection between HMC and optimization, but I don’t fully comprehend how Stan can be used for optimization.

    • There’s no connection between Stan (as it stands as of version 1.3 or the imminent version 2.0) and variational Bayes (VB) or expectation propagation (EP). Other than that all three can be used for Bayesian model estimation.

      Stan does Markov chain Monte Carlo (MCMC) using the no-U-turn sampler (NUTS), an adaptive form of Hamiltonian Monte Carlo (HMC) and posterior mode finding using BFGS. Our focus is on MCMC.

      EP and VB both provide point estimates of distributions approximating the posterior.

      The next version of Gelman et al.’s Bayesian Data Analysis explains HMC, VB and EP pretty fully and includes sample programs in R for all of them to fit Andrew’s pet model, 8-schools.

      • Right, but I was referring to Andrew’s comment: “they can teach Stan, once it has a few more useful features such as VB and EP.”

        It seems pretty clear that he expects Stan to feature EP and VB somehow. I don’t understand how a MCMC program can do that, which is what I was asking. Simply a completely separate feature?

        Also, any idea just how imminent 2.0 is?

        • Yes, we all expect Stan to feature VB. We’re in fact going to implement the non-conjugate VB of Wang and Blei:

          http://arxiv.org/abs/1209.4360

          along with the stochastic VB of Hoffman, Blei, Wang, and Paisley:

          http://arxiv.org/abs/1206.7051

          VB will be a separate command like optimization is currently. It’s not based on MCMC.

          As to Stan 2.0, we’ve frozen features as of this week’s meeting and are just waiting for integration tests to finish, which are probably going to take a week or so to run (we run LOTS of tests, and there are a dozen or so pull requests waiting to be merged). You an see what’s going in and not in our issues tracker. Everything tagged with milestone of 2.0.0 will be in, everything else not.

          I’m less certain about EP and how to generalize it enough. We might build some specialized instances for Gaussian processes.

Comments are closed.