Equal time for SAS

In a comment here, Steve Polili of SAS writes, “neither SAS nor Anne [Milley] hates R or open source. We run on Linux and we love Apache. For a little more info on SAS and R, take a look at a followup on Anne’s blog [entry],” where Anne writes:

First, SAS and I applaud the innovative contributions and passion of the R community, and users who apply R to solve problems. In a very real sense, we are grateful for R, as it provides a freely available venue for bleeding-edge and experimental data analysis methods, and underscores the increasing importance of advanced analytical and graphical methods in this age of massive data volumes.

Over the past three decades, SAS has made and continues to make many noteworthy contributions to advanced data computing. As a trusted supplier to a large and diverse set of organizations, SAS provides analysis software that has been refined over years of customer application and feedback. SAS software is fully supported and used daily in countless ways. SAS is also scalable to very large data sets (multi-threaded, grid-enabled, etc.). These issues remain very important to the organizations we serve.

This seems reasonable to me. There’s certainly room in the world for SAS, R, Stata, and SPSS, as well as Fortran, C, Python, and even Excel. I find it useful to have different software for different purposes. I’d like R to have better data input and merging tools, but until I become aware of such tools, I’ll continue to use Stata (indirectly) to do some of this.

So, in that case, how could I go around saying, “I just hate SAS”?

It goes back to my experiences. I’ve never actually logged into SAS or used it directly; my involvement comes from talking to people who’ve run SAS analyses and seeing students who’ve used SAS in their classes.

What have I seen there? The SAS users always seemed burdened by pages and pages of output, which they’d grab and paw through until they found the number they were looking for. Yes, I agree that SAS can fit multilevel models and was doing it way back before you could do it in R. But my impression is that it’s very difficult to post-process SAS inferences, for example for making plots. The only example I know of with postprocessed SAS analyses, the statistician had to take the results, save them into a file, and port them into another statistics package. I don’t doubt that more could be done within SAS, but it doesn’t seem to happen much.

As for coursework: the classes I’ve seen where students use SAS seem to have more of a classical statistical approach, where you pick your analysis and do it. Less data and model exploration than I tend to see in R or Stata.

But I have little doubt that SAS can be a highly effective tool, and if you are already using SAS, maybe it makes sense to stick with it. Ideally the best features in SAS will eventually appear in R, and vice versa.

In all seriousness, there are a lot of days where I wish I was still doing everything in Fortran. It’s great to have control of what you’re doing, and, in using R, I have to give up some control. That’s fine if I’m fitting lowess or whatever, but it gets annoying when I’m painstakingly putting together graphs by hand. I don’t think SAS would solve my particular problems at all, but others have different needs.

P.S. I’m sorry for writing, “it’s good to hear that SAS is in trouble.” That wasn’t a nice thing to say. Let me reprhase to say, “It’s good to see SAS getting some healthy competition, which I hope will improve life for users of all these packages.”

12 thoughts on “Equal time for SAS

  1. As someone who learned R and SAS simultaneously, while being introduced to statistics, I can say honestly that I hate SAS. The syntax was arcane, output was burdensome, and combing through the log file when something went wrong was difficult. I actually wrote my own user interface for SAS in perl, if you can believe it, for parsing the output and log files. New ODS features have rendered much of this moot, but R was (and is) lightyears better for almost everything I do.

  2. Great post! Your postscript makes good sense. As social scientists we're lucky to have great tools like SAS, R, STATA, and the others you mentioned in our toolboxes.

  3. What sort of "data input and merging tools" do you want? What problems are you having that you can't solve with existing tools? This is one of my research interests, so I'd like to hear more.

  4. SAS does heavy lifting very well. As long as your platform supports large files on disk. If I'm not mistaken, R relies on lots of memory space. This is becoming less of a problem these days, but not insignificant.

    But fully agree with frustration with pages of output, poorly presented. SPSS has the same problem (maybe "had", last used around 3 years ago). Fine for people looking for answers that fit a protocol, but not for those wanting more insight. R provides great flexibility. What I meant in a previous thread when I said R fills the niche of filling the niche.

  5. First, I love the civil and earnest discussion on this blog — your apology for a potentially discourteous statement being further proof of this idea rather than a counterexample.

    Second, I used R a little while a student, so I don't have much to say there, but I use both SAS and Stata extensively in my research now. And, though I have a strong preference for Stata, SAS has at least one really wonderful feature — the ability to handle arbitrarily large datasets. I regularly need to process datasets that exceed the system memory limits of even very large Unix servers, so I am glad that SAS is around at those times to chug away one line of data at a time. Surely, if I knew more programming (like perl), I could do the same thing in less time, but alas, I don't.

  6. I'm not sure I'm inclined to let the SAS people of the hook so easily. Anyone remotely familiar with open source debates _must_ know that the "unreliable", "not for airplane building" etc. is the most commonly held stereotype against open source software (of all kinds).
    Propagating that – and if it's just to "make a point" as Anne claims she did – is something I find pretty outrageous.
    Had she not used the term "freeware" I'd be much less angry – but a company running on Linux and Apache (as they claim) dissing "freeware" is just…

  7. We use SAS and Stata extensively here and the dividing line begins at data set size — we have sme data sets with tens of millions of observations and lots of fancy merges which Stata would just choke on. But of course the next step is you get a divide between those who are more comfortable in SAS and those who are more comfortable in Stata at which point the dividing line gets blurry. Arguing about statistical tools is a little like arguing MACs vs. PCs… a little experience trumps a lot of design difference.

  8. Chuck: 'SPSS has the same problem (maybe "had", last used around 3 years ago).'
    It still does. I have to use SPSS a lot at work and I hate it. My main gripes are the bloated output and the inconsistency. It wouldn't be so bad, but nearly everyone in my field teaches SPSS to undergraduates (for employability reasons – it isn't unusual to see job adverts requiring SPSS skills). SPSS is just appalling for teaching introductory statistics.

  9. I was a programmer from the late 70's to the 90's;

    And when I look at SAS;

    I see a tool designed for;

    punched cards ;

    and not for programming;

    RUN;

  10. There is a barrier to entry (learning curve) when learning any language. I think the slope of the curve for SAS is relatively flat. I spent 2 years in my early 20s learning the subtelties of SAS at work – chosen somewhat arbitrarily among the packages available – so I'm used to its idiosyncracies. I have noticed over the years that people new to SAS often get intimidated by SAS because of its quirky syntax and voluminous output. SAS has a reputation for being hard to learn/use. Once I was able to get past this stage in the learning process, I found SAS to be a great environment for building repeatable data cleansing and analysis routines. They also maintain documentation that I learned to decipher.

    After a couple of years I found that I can read and re-use old codes pretty efficiently as long as I'm disciplined about organization. Now I feel empowered to manipulate data (of all sizes) quickly, without error, and in every way I can imagine. For post processing output can be saved to SAS data sets, txt files and many other file types. The one big complaint that I have is that graphing in SAS is way too hard, so I usually move the data to SPlus or even Excel.

    The investment in learning SAS is high (in $ and time) so it may not be for everyone, especially those who may not pursue a long-term career analyzing data.

  11. Take a look at JMP, an interactive and highly visual statistics package from SAS. Graphic displays are automatic and interactive for almost every analysis. You can get a trial download of JMP 8 (the newest version) on their website _www.jmp.com. (SAS users: JMP reads a SAS dataset directly) Be sure to try the click and drop Graph Builder.

Comments are closed.