Skip to content
 

What is a pull request?

Bob explains:

A pull request (PR) is the minimal publishable unit of open-source development. It’s a proposed change to the code base that we can then review. If you want to see how the sausage is made, follow this link.

If you click on “files changed”, you’ll see what Sean is proposing doing with the code. Interpsersed in there are 67 comments out of line and many more than that inline on the code. This is the PR that kicked off this discussion of how extreme we should be in reviewing (but you’ll also see this pull request touched almost 200 code files).

As soon as a pull request is made, it kicks off testing on multiple platforms that takes nearly a day to run to completion.

10 Comments

  1. Jason Yip says:

    Only if the version control system is git (or similar distributed system). Otherwise, it’s known as a patch.

    • Andrew specifically asked me what “pull request” meant. Pull requests are specifically GitHub, not Git itself. Pull requests provide a bit more than a simple patch (diff you can apply to code) in that GitHub gives you the code reviewing route and also lets you hang continuous integration testing hooks on the pull requests. Git may be distributed, but we use GitHub as the origin, so it doesn’t feel very distributed other than that you have a complete copy of history stored locally and you can accept pull requests from other organizations (fork) as well as internally (clone).

      Are there other systems that open source developers use that provide this kind of functionality? Pretty much everything I see these days is being developed on GitHub.

  2. roy says:

    But Bob missed out on the obvious picture link for the top of this post – no not a cat – how do you say Dr. Doolittle?

  3. Statsgirl says:

    Should be “kicks off” testing.

    /pedantry

    • Edited. That was a quick email reply to Andrew.

      I should also point out that this is a super complicated pull request that reactored the underlying way that operands and partial derivatives were being stored in all of Stan’s probability functions. So there was a ton of testing and auxiliary code that needed to be changed to maintain consistency. Often pull requests are only a few lines of code or sometimes even just a single token.

  4. cugrad says:

    To that end, I think the CS/ML field is coming strong at the statistics field. A number of CS-ers are very good at statistics, even on the theoretical level. It’s mainly through examples and case in point. For example: ML, DL, RNN, GAN. (It’s like a bottom-up process)

    It’s different how stat dept approaches it: first theory, then practice. (top-down process)

    I am curious what the future of CS and STAT will be like.

    • As to teaching top-down or bottom-up, that depends on the instructor more than the discipline. You find CS people teaching highly theoretical machine learning classes focused on convergence proofs and asymptotic complexity and stats professors teaching very applied stats courses on ANOVA using R that never once throw down an integral.

      The difference among the academics is that you have to learn stats theory to get a Ph.D. in stats and you have to learn CS theory to get a Ph.D. in comp sci. Most Ph.D. programs in comp sci aren’t going to teach you to code any more than most stat Ph.D. programs are going to teach you to do applied stats.

      • cugrad says:

        Yes, for any Ph.D., you need your theory. But, that’s not about I am talking about.

        For example, in scikit-learn, you can fit a machine learning model in only three lines. First line is calling the model: linear_model, svm, etc. Second line is calling the fit: linear.fit, svm.fit, etc. Three line is evaluation (i.e. L2, ROC).

        The codes does it all for you. You don’t have to understand a thing. In fact, I have seen a number of people in the industry run the code, watch a few tutorials on how to interpret the metrics for goodness of fit, then you can call yourself a data scientist and get paid 120k.

Leave a Reply