Migrating from dot to underscore

Posted on August 28, 2012 7:29 AM by Andrew

My C-oriented Stan collaborators have convinced me to use underscore (_) rather than dot (.) as much as possible in expressions in R. For example, I can name a variable n_years rather than n.years. This is fine. But I’m getting annoyed because I need to press the shift key every time I type the underscore.

What do people do about this? I know that it’s easy enough to reassign keys (I could, for example, assign underscore to backslash, which I never use). I’m just wondering what C programmers actually do. Do they reassign the key or do they just get used to pressing Shift?

P.S. In comments, Ben Hyde points to Google’s R style guide, which recommends that variable names use dots, not underscore or camel case, for variable names (for example, “avg.clicks” rather than “avg_Clicks” or “avgClicks”). I think they’re recommending this to be consistent with R coding conventions.

I am switching to underscores in R variable names to be consistent with C. Otherwise we were running into difficulties because Stan, following C, does not allow dots in variable names. I don’t want to have a variable that’s called sd.y in R and sd_y in Stan. Much easier to have the same name in both. We don’t want to be changing Stan’s rules (too much of a mess given that Stan is written in C++) so I have to change my R conventions. Then once I switch to underscores for variables that go into Stan models, I’m inclined to be consistent and use underscores throughout.

30 thoughts on “Migrating from dot to underscore”

Robert Kern on August 28, 2012 7:59 AM at 7:59 am said:

For the most part, we get used to pressing Shift and use an editor with completion support so that we usually never have to type most of the variable name anyways.
Chuck on August 28, 2012 9:03 AM at 9:03 am said:

I press shift and never think twice about it. Hmm. Maybe I should. (Of course, I don’t usually write C for a living.)

Chuck
Karthik on August 28, 2012 9:04 AM at 9:04 am said:

What argument convinced you to change?
John on August 28, 2012 9:12 AM at 9:12 am said:

I just press shift, but I hate underscores because they are hard reach and I have to move my hands. I instead use camel case so n.years becomes nYears. It is still a shift key, but it is almost automatic since I use caps when writing papers. I think I picked camel case when I started programming in Java. (Remapping the backslash key would be death for me because it is used so much in LaTex and for control characters like \n.)
yop on August 28, 2012 9:37 AM at 9:37 am said:

Could you convince us as well? I mean, why should we use underscores instead of dots? Honestly interested in the answer.
paul on August 28, 2012 9:42 AM at 9:42 am said:

In C, for commonly used functions, use emacs abbrevations, which allow you to type the function name without underscores or spaces, and the abbrevation expands it. Also, we use shift key. One spends so much time pressing Ctrl that Shift does not seem like a burden.
Opher Donchin on August 28, 2012 10:27 AM at 10:27 am said:

The advantage of underscores is that your variable names can be used as variable names in a much wider variety of languages. At least, that’s what I can come up with. I’m curious what convinced Andrew.
Ben Hyde on August 28, 2012 10:54 AM at 10:54 am said:

Ah, switching costs; or are these shifting costs?

You know, if you immerse yourself sufficiently in another cultures’ norms your fated to suffer culture shock and a minor nervous breakdown.

Meanwhile, I found this interesting: http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
Anonymous on August 28, 2012 11:00 AM at 11:00 am said:

Maybe a Programmer Dvorak Keyboard Layout?
Erich Nahum on August 28, 2012 11:40 AM at 11:40 am said:

We C programmers type at 320 WPM, so we don’t even notice.
kjetil halvorsen on August 28, 2012 11:50 AM at 11:50 am said:

My biggest keyboard headache (but I have a latin american keyboard, with ñ) is (mostly in laTeX), to press
\ ~ ^ (needs alt gr + own key, and must use two fingers on same hand). Any good tips for solving that?
Naadir Jeewa on August 28, 2012 12:05 PM at 12:05 pm said:

The dots really confused me about R when I first saw some code. I naturally assumed these were properties of an object. I guess I’m partial to camel case, coming from the Java world.
John on August 28, 2012 12:42 PM at 12:42 pm said:

This seems really inappropriate in R. R convention uses dots.
Kevin on August 28, 2012 12:59 PM at 12:59 pm said:

I use camelCase because I find it more readable, quicker to type, and more common in languages outside of R.
Adam on August 28, 2012 1:03 PM at 1:03 pm said:

This caused me headaches with reading R code, and still does. A dot looks like it is accessing a property or a method. I must know a dozen languages well, and using a dot as part of a variable name is just counter-intuitive for me. I used to use underscores, but now I prefer camel case after writing a lot of objective-c.
Ryan J. Parker on August 28, 2012 1:48 PM at 1:48 pm said:

Why can’t stan support “sd.y”? It’s just a string, after all. If you’re writing code directly in C++ then sure you have to use an underscore, but really you can just represent all of these as some specific data type and then have labels attached to them (thinking in terms of calling stan from R, for instance). Either way, you’ll soon become comfortable with pressing two keys at once. :)
Radford Neal on August 28, 2012 2:31 PM at 2:31 pm said:

I got used to using dots when starting R, but really underscores are better. They’re easier for non-R programmers to read, they don’t cause ambiguity with S3 methods, and as you say, they’re better for compatibility with other languages. The only reason underscores aren’t used more in R is that years ago they weren’t allowed. The reason they weren’t allowed is that underscore was a synonym for the assignment operator. The reason underscore was a synonym for the assignment operator is that on old ASR33 teletype machines, the ASCII code for underscore was actually printed as a backarrow. Really, this isn’t a good reason to keep using dots in variable names.

As for CamelCase, it’s just an abomination. Plus you never can remember whether the first character is in upper or lower case (people do both).
Lord on August 28, 2012 2:36 PM at 2:36 pm said:

Dots are for object/structure components in C so n.years would be the years component of an n structure. I would say it is most common to eliminate redundant characters entirely and use nyears or nYears and keep them local to the function or subroutine. Global variables are a different story and you might want to lead with some designation where defined or define a structure to hold them, though underscores are often used in system interfaces.
Bob Carpenter on August 28, 2012 3:40 PM at 3:40 pm said:

My ears were burning, so I thought I’d clarify our reasoning.

The main reason we didn’t want to use dots in R parameter identifiers because we wanted the RStan interface to have variable names for options that matched those in Stan. We didn’t want function names with dots because of the issue with S3 methods (as pointed out to us by Ben Goodrich and above by Radford).

Because we wrote Stan in C++, we used C++ conventions.

Python, C++ and Java all forbid dots in variable and function identifiers because of confusibility with method calls on objects. A typical requirement for variables is that they start with a letter or underscore and may be continued with any number of letters, digits or underscores.

I’m OK with camel case. It’s the standard in Java. For some reason, lots of people find it jarring. The C++ community doesn’t even like capital letters for class names, though you see it in some packages like Eigen.

The biggest bummer about underscores is that in emacs’s R mode, it tries to convert them to “<-". Personally, I prefer "=" to "<-" because it matches other programming languages, but it's the convention in R. The top-row keys always do get short shrift in typing class, but underscores become second nature after decades of practice. And who notices typing shift except on the sub-optimal iPad virtual keyboard?
- yop on August 29, 2012 2:08 AM at 2:08 am said:
  
  People with most European keyboards notice typing shift. The left shift key is usually shorter and requires more finger contortion.
- Zach on August 29, 2012 12:08 PM at 12:08 pm said:
  
  R uses arrow assignment because equal sign is reserved for passing something to a function’s argument.
  
  For example:
  
  print(x = “hello world”) passes the string to the internal variable x.
  
  print(x <- "hello world") assigns the string to variable x in the global environment and passes the variable to the function.
  - Bob Carpenter on August 29, 2012 1:55 PM at 1:55 pm said:
    
    You can use the equals sign for assignment in R. Even in the print() function.
    
    > print(y = 7)
    Error in print.default(y = 7) : argument “x” is missing, with no default
    > print((y = 7))
    [1] 7
    
    I wouldn’t advise it, though.
  - jimmy on August 29, 2012 3:23 PM at 3:23 pm said:
    
    i prefer using = for the assignment operator. but is your comment the rationale for using <- instead of = in R? can someone comment more on this?
Tom Moertel on August 28, 2012 6:16 PM at 6:16 pm said:

If you’re using ESS in Emacs to edit R code, you can get rid of the confusing underscore behavior as follows:

1. M-x customize-variable RET ess-S-assign RET
2. Change the variable’s value from ” <- " to "_".
3. Choose to "Save for Future Sessions" your customization.

And then you're back to typing underscores as underscores :-)
- Andrew on August 28, 2012 6:26 PM at 6:26 pm said:
  
  I used to edit R in Emacs but recently I’ve been using Rstudio which is just so convenient. I just wish Rstudio used Emacs as its text editor!
  - Ben Bolker on August 28, 2012 10:57 PM at 10:57 pm said:
    
    Not quite as good as using Emacs internally, but apparently Rstudio allows you to switch to emacs keybindings now … ??
    
    http://support.rstudio.org/help/discussions/suggestions/411-r-and-console-editing-command-suggestions-emacs
- Daniel on August 29, 2012 8:20 AM at 8:20 am said:
  
  Not sure you need to go to these lengths – in ESS if you just tap the underscore twice, it will change back from ‘<-' to '_' which is almost as easy, and allows you to take advantage of the default behaviour also
Jordan on August 29, 2012 11:00 AM at 11:00 am said:

You never use backslash? How do you TeX?
Clark on August 29, 2012 2:06 PM at 2:06 pm said:

As a programmer who became a statistician, going from C++ to R (and SAS), I have come to prefer camelCase, followed by underscore. I never use dot in a name — it just feels wrong, for all the reasons mentioned above (what were the R developers thinking when they thought dot was a good idea?). I prefer camelCase to underscore mainly because it yields shorter and more readable variable names, and the variable names work in most any language that I’m aware of. Occasionally I’ll use a blend of camelCase and underscore if it improves readability.

I also
{
like
{
to
{
indent
like
}
}
this,
}
for vivid clarity of the blocking structure. Unfortunately, R constrains my options in this regard.

Gosh, remember early versions of languages like BASIC where we were limited to variable names of 8 alphanumeric characters or less, in all-caps? I’m still getting used to arbitrarily long variable names.
- Bob Carpenter on August 30, 2012 7:19 PM at 7:19 pm said:
  
  Yikes. I value my vertical space too much to devote whole lines to open curly braces.
  
  I hope we can all agree to just say “no” to tabs in code (other than Python and make, of course, speaking of “what were their designers thinking?”).
  
  I prefer camel case myself to underscores, but it’s not really done in C++, and style of punctuation/spelling is no place to innovate.
  
  In the R/S developers’ defense, R/S have been around longer than C++ or Java (since 1976, according to the Wikipedia).

Comments are closed.