Exposure to Stan has changed my defaults: a non-haiku

Now when I look at my old R code, it looks really weird because there are no semicolons
Each line of code just looks incomplete
As if I were writing my sentences like this
Whassup with that, huh
Also can I please no longer do <-
I much prefer =
Please

28 thoughts on “Exposure to Stan has changed my defaults: a non-haiku

    • Gregor:

      Using = does work, but not always. I’ve been told that it occasionally fails, so I’m loath to use = in the code I use in our books, because I fear that it might mess up if people try to include such code in functions or whatever.

      • The only case where it fails is you’re trying to do assignment *inside* a function call – which is generally not done. For example, with <-, it's possible take the mean of 1:10 and assign 1:10 to x all at once:

        z <- mean(x <- 1:10) ## this will create both z and x in the global environment, z is 5.5, x is 1:10

        z = mean(x = 11:20) ## this will create (or modify) z in the global environment, but not do anything to x

        I've never wanted to assign variable something inside a function call, so I don't miss that functionality. I actually see it as a benefit of using = If I'm turning a script into a function and my assignments in the script were done with = I can copy/paste them into the function arguments. If I used <- in the script and did the same copy paste, the function might still work but it would have unintended consequences of all the external assignments.

        There can also be issues of precedence (assoctiatively) if you mix and match = with <- in a compound assignment… `x <- y = 5` is different than `x <- y <- 5`, but `x <- y <- 5` is the same as `x = y = 5`. As long as you don't mix and match there is no issue. (And how often do you use compound assignment?)

        If you follow two easy good practices: (1) don't assign things inside function calls, (2) don't do compound assignment (at least not with different assignment operators), there won't be any other issues.

        • It’s true that there are not so many cases in which it matters whether you use = or <- for assignment. I guess the biggest problem with = for assignment is not really a problem with using = for assignment but just that it results in R code that really does look foreign to most R other users, especially R users who aren't already well versed in other programming languages. For better or worse (probably worse) most R users are going to be using <- for the foreseeable future. So in a textbook, at least, I think it makes sense to go with the standard.

          I suppose another reason some people prefer <- is that = is used for other things like case statements and argument binding, so some people might prefer to not overload it further by using it for assignment too. Personally I like = (and I'm glad we made the switch from <- to = for Stan), but I continue to use <- in R code, at least for now.

        • I agree that in a textbook setting going with the far more common <- makes sense. But on blogs or on Stack Overflow I use = and hope that some users will see it and like it and follow suit.

        • Actually, it’s fairly common to use assignment in an R function call in a perfect legitimate (and unavoidable) way. For example, say I have

          `foo <- brm (y ~ x1 + x2 + (1 | x3) + (1 | x4), data=bar)`

          and I decide I want to time it. I simply do:

          `system.time (foo <- brm (y ~ x1 + x2 + (1 | x3) + (1 | x4), data=bar))`

          and R is not confused by what I'm trying to do. If you use "=" instead, system.time could think you're trying to pass the parameter "foo", which it hopefully does not have.

          There are three distinct concepts here: equality testing, named parameters, and assignment. R chooses to use "==", "=", and "<-", while other languages make other choices. For example, languages in the Algol family used ":=" for assignment.

          R is very powerful in that functions, formulas, etc, are first-class objects so distinctions have to be made where lesser languages don't have to. (Or the language may have made other choices.)

          This argument sounds a lot like: "This whole mean, median, mode thing is tedious. Since I hate to use two syllables where one will do, from now on I'm going to use 'mean' in all three cases and let context distinguish."

        • Simply enclose the expression in (curly or round) brackets, and you’ll be perfectly fine:

          system.time({foo = brm (y ~ x1 + x2 + (1 | x3) + (1 | x4), data = bar)})

          Other than that, there are *no* differences between these two. Trust me :-)
          You should be consistent, though – not only because it’s encouraged by all coding styles, but because the two operators have (for some reason) different operator precedence. This will work:

          x = y = 5

          This too:

          x <- y <- 5

          And this:

          x = y <- 5

          But not this one (!):

          x <- y = 5

    • You can use =, but I would encourage you not to. Virtually all coders use <- and all style guides recommend <-. When I see = used for assignment in R code, it usually indicates a novice coder or someone who has just transitioned from another language.

      That said, there is an alternate more rational universe where <- doesn't exist and we all use =.

      • Well, I use R on a regular daily basis for nearly ten years now, so I would guess I’m not a novice anymore. But I still prefer to use “=” as an assignment operator, the code looks much cleaner and intuitive this way.

        • @paul I in no way want to imply that any particular person is not skilled based on their stylistic choices. It is just something that I tend to notice reading lots of peoples code. As a coder gains in experience, they tend to work with many peoples’ code in collaboration. Since almost everyone else uses <-, having one file, or one subsystem use = is very jarring when you are managing a large codebase.

  1. Using “=” as an assignment operator doesn’t make sense. “” are more sensible. The problem with “<-" is that you might naturally type "x<-3" with the intention of having it mean "x is less than negative three". An assignment operator shouldn't depend on white space.

    • The unsymmetric use of “=” as assignment is no less clear in R than in any other programming language. I switched over a few years ago and haven’t had any problem with it.

  2. I use <- basically because I learned that way and now it's just tradition. I like it better the way the code looks (again, 100% because of tradition) even though it kind of is a pain in the ass to type (even worse know that I bought a new laptop with a different keyboard layout!).
    I would love to switch to =, but I just find it ugly. Damn me!

    • I simply use Autokey and assign a code to it. A the moment I use / + aa to get <- Actually as easy as typing = which is a long reach, and I find the difference between <- and = worth maintaining. I still maintain is makes for clearer code. Thousands will disagree.

      I started out with Fortran so I am used to = as an assignment statement. Given R syntax I prefer <- .

    • Heretic! The Inquisition has been notified.

      RECANT, RECANT! It is not too late.

      I have never used -> but I think I have seen it. One does, sometimes, wonder about the people who wrote R.

      • It helps to remember how old S (err, R) is. It is a marvel how modern it is given it’s roots in the 1970s. Just thank the lord that the creators of R, in their infinite wisdom, abandoned S’s blasphemy of dynamic scope.

  3. for(int i = 0; i < —

    ah, I'm writing R, dang.

    public static double informationGain(doubl—

    feck, I should be writing R!

    *waiting for a non-vectorizable for-loop to be evaluated*

    Yes, I'm writing R.

  4. Yup. Turns out punctuation matters for ease of reading. Andrew originally mocked Stan’s syntax, calling it “BUGS with semicolons”.

    The secondary motivation for semicolons is that it renders the language whitespace insensitive in the sense that wherever one whitespace can occur it can be one or more of any type of whitespace (tab, return, line feed, space). R, in contrast, is sensitive to the distinction between a newline and an ordinary space character. For example, this script

    a <- b -
         c
    

    will assign b - c to a, whereas

    a <- b
         - c
    

    will assign b to a and return -c.

    Now mathematics typesetting standards always put the operator first in a line of text when continuing a line because it makes it way easier to scan the structure of a formula (such as a sequence of sums).

    • I like the focus on the fundamental principle of white space insensitivity. If you are going to have it, you need end line markers. Being whitespace insensitive also means that writing code into string literals for execution, something done with STAN quite a lot, is less prone to error.

      Ascetically, I like the other extreme of full whitespace utilization. I think the following looks nice for instance:

      transformed data:
      real y[5]
      y[1] = 2.0
      y[2] = 1.0
      y[3] = -0.5
      y[4] = 3.0
      y[5] = 0.25

      parameters:
      real mu

      model:
      for n in 1:5:
      y[n] ~ normal(mu,1.0)

  5. Would someone like to make the fuller version of the argument that I should begin to use semi-colons to stack several “independent clauses” of R code onto the same line, rather than the standard one-line-per-code-clause? I’m interested in hearing it and willing to switch, but generally, I’ve worried that using semi-colons would lead me to forgetting to see certain lines of code, and it prevents me from easily moving code up/down lines as needed.

Leave a Reply

Your email address will not be published. Required fields are marked *