I often run into people who’d like to learn how to program, but don’t know where to start. Over the past few years, there has been an emergence of interactive tutorial systems, where a student is walked through the basic examples and syntax.
- Try Ruby! will teach you Ruby, a Python-like language that’s extremely powerful when it comes to preprocessing data.
- WeScheme will teach you Scheme, a Lisp-like language that makes writing interpreters for variations of Scheme very easy.
- Lists by Andrew Plotkin is a computer game that requires you to be able to program in Lisp. Lisp is the second-oldest programming language (after Fortran), but Ruby and Python do most of what Lisp has traditionally been useful for.
Maybe there will be a similar tool for R someday!
Thanks to Edward for the pointers!
Aleks, don't bury Lisp just yet! See
http://www.stat.auckland.ac.nz/~ihaka/downloads/C…
and
http://incanter.org/
Berkley has video casts of thier first computer science courses. It's how I learned to program:
http://webcast.berkeley.edu/courses.php
For python: http://www.skulpt.org/
There is a lot of material at MIT's OpenCourseWare site. Course 6.00 is a beginning programming course using python. The site has videos of lectures, homework assignments, exams, and readings all free online. 6.001 is a slightly higher level beginning programming class using scheme, also with videos of the lectures and the textbook is online. Here is a full list of their electrical engineering and computer science course materials.
Oops, I submitted that last comment too soon.
Stanford has a sequence of three courses, here. It looks like they're using java.
for Java: http://java.sun.com/docs/books/tutorial/
So many Python mentions and no Python resources? Python is good for all kinds of scientific computing not only statistics, and *especially* so easy to learn for the beginning programmer!
A Python programming tutorial.
http://www.alan-g.me.uk/tutor/
The hub (as far as I'm concerned) for scientific programming and Python is SciPy (at its simplest a Matlab replacement, but really much more)
http://www.scipy.org/
http://conference.scipy.org/
http://www.scipy.org/Topical_Software
Hopefully, Python will one day supersede even R for scientific programming, since it is a full blown programming language with such simple and natural syntax.
Premature and self promoting plug:
http://statsmodels.sourceforge.net/
I second jF's comment above: go to the MIT Open Courseware site.
O'Reilly (oreilly.com) has an excellent series of books on various languages: Java, Python, Perl, C#.
I highly recommend Steve McConnell's book Code Complete (2nd ed) for anybody who is serious about programming. Even though its intended audience is professional programmers, the book still contains a lot of useful advice for novices. The chapters on variable naming, conditionals, and looping would be good starting points for the beginning programmer. A big advantage to consulting this book: you'll avoid falling into sloppy habits from the start, and won't have to unlearn bad practices.
As far as languages go, I'd recommend an object-oriented language like Ruby, Python, or Java. Starting with an object-oriented programming approach will mean you don't have to unlearn a lot of things from procedural programming. Object-oriented programming makes it much easier to have a programming model that closely replicates the real-world problem you're trying to address. I'm not sure about the future of Java: since Oracle bought Sun, it's not certain how they're going to handle Java.
I'm looking into functional programming, which is a new concept that has been advanced in the past few years. Erlang, Haskell, and F# are all available for free. F# will be included with the Microsoft Visual Studio 2010 development environment.
66
Starting with an object-oriented programming approach will mean you don't have to unlearn a lot of things from procedural programming. Object-oriented programming makes it much easier to have a programming model that closely replicates the real-world problem you're trying to address.
99
I wold like to believe in this. I have tried several times to move into OO programming – working mainly in SAS it is a spare time thing
– and always failed.
I really have never understood how treating a set of data as objects helps me fit a linear model (for e.g.) and in particular helps me get estimates that are numerically good. By help I mean 'gets me good results with less code'.
I can understand how R (as implementation of S language) does do this, because I have also learned that.
(BTW I know later S is somewhat OO, but even the earlier version was an advance over the datastep language, so I do not think this proves OO is good for Stats programming)
Dave – still on the path to enlightenment (R)
^
Indeed, I never quite understood the obsession of many programmers with OOP, specially for simple tasks. What is exactly wrong with procedural programming and why does it not replicate typical problems encountered in statistics? I can see a justification for OOP when building large applications and libraries in teams programmers, but do I really have to make an object to print "hello world"? please..
A typical estimation problem in statistics can be broken into these parts:
1. import and optionally rearrange your data.
2. estimate parameters, compute the standard errors.
3. optionally compute some test statistics
4. print the results
5. optionally make some graphs
In my view, such action flow fits neatly within the old good structured programming paradigm (as long as you're implementing it in a language with matrix data types) which you can implement in R, MATLAB, GAUSS, FORTRAN, etc.
Jacob, data is best represented with an object, a variable is an object, a model is an object, results are an object.
With a good object-oriented scaffolding, the procedures become easy to do. And, if you change a data object, supporting a new type of data that still looks like a data object, then everything else will still work well.
I see R as object-oriented. As an example of what can be done, see Fully Bayesian Computing.
Aleks, Thanks for the pointers. I hear you about the OO scaffolding, but was distracted for a bit… Being a Java programmer, I don't see R as OO in any sense I'm used to — but you're talking about data.frames, vectors, etc, as objects. In that frame (pun intended), R is definitely OO.
For R, I'd recommend the Learning R blog (http://learnr.wordpress.com/), the Programming in R website (http://zoonek2.free.fr/UNIX/48_R/02.html), and the R Tutorial (http://www.cyclismo.org/tutorial/R/). All useful sites. (FYI: I'm not associated with these sites in any way, just find them very useful.)
Thanks again for the post. I'll definitely check out the links you provided.
John
What is the best software and/or programming language to get familiar with for a research position in a typical "think tank"? Is Ruby good for this?
Thank you, anyone.
To do statistical research, I'd learn R first, and then either Python or Ruby depending on where I can get help. After this, I'd learn a speedy language like C or Java to make things work fast, or web programming (which isn't a separate language, but a technique based on Python or Ruby) to share my work with the rest of the world.