Excel-bashing

Posted on April 17, 2013 4:48 PM by Andrew

In response to the latest controversy, a statistics professor writes:

It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software!

Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. But still, it’s such a joke. Word has problems too, but I can see how these problems arise from its desirable features. The disaster that is Excel seems like more of a mystery.

65 thoughts on “Excel-bashing”

Stuart on April 17, 2013 4:55 PM at 4:55 pm said:

You know the old aphorism: Friends don’t let friends use Excel for statistics.
- Rahul on April 17, 2013 11:01 PM at 11:01 pm said:
  
  I think Excel bashing is silly. It is the people using Excel for the wrong things that are the problem. Excel works great for a quick estimate, or moving a few columns around. Or getting a first look at general graphical trends. etc.
  
  If you get to the point where you are manipulating hundreds of cross references or trying to compute a complex statistic, or using a nasty multi-layer vlookup / hlookup using Excel obviously you are in disaster territory. The trick is to know when to shift to R, Matlab, gnuplot, SQL etc.
  
  I’ll credit Excel with being a net help in the early stage of solving a lot of problems. Often it gives a good brainstorm about what strategy and tools to use and I can’t think of what other tool I’d start in.
Matt on April 17, 2013 5:04 PM at 5:04 pm said:

I’ve always wondered about this. As a statistician, I’m well aware that there are many many top statistical researchers at Microsoft Research. I can only assume that MS Research has some excellent people in more straight CS too (ie not in statistics renamed machine learning with some different foci). How they continuously fail at things like implementing regression without basic computational problems or designing a fast, secure, and stable operating system? Can they only hire good people in research, and not in the rest of the company?

On the face of it that’s hard to believe. Though the large majority of engineers and tech business side people I know would all reject a job offer from Microsoft very easily unless they were being offered a salary MUCH higher than anywhere else. Then again, my sample is biased cause really the only people I know in the tech industry are in the Bay Area, not Seattle/Redmond, but these days it seems possible. That’s not true of the statisticians and CS folks I know considering industry research lab positions.
- Rahul on April 17, 2013 11:02 PM at 11:02 pm said:
  
  If you think Excel is bad try explaining Notepad! :)
Entsophy on April 17, 2013 5:09 PM at 5:09 pm said:

I’ve had the unenviable experience of using excel vba to do serious statistics because it was the only programing language allowed on the computer network. There weren’t even any libraries or addins available. One can only assume that Excel was specifically designed to be unusable by the statistics community.

On the other hand, I have fond memories of using the circular reference capability of cells in excel to effortlessly solve Laplace’s equation. The cells formed the x-y plane and value in the cell was the z coordinate. You set up the boundary condition by specifying the z-values in a loop of cells and then set each cell equal to the average value of the four adjacent cells. Works like a charm.
- Chris G on April 17, 2013 8:26 PM at 8:26 pm said:
  
  > One can only assume that Excel was specifically designed to be unusable by the statistics community.
  
  Very true. But it’s not just that it’s near useless for non-trivial statistical analyses, the graphics are godawful too! I could forgive the lack of analysis capability if the graphics were halfway decent. I’ve often wondered if Excel is an inside joke at Microsoft. Maybe someday after Gates et al shuffle off this mortal coil someone will publish a tell-all book which explains it.
  - Shaun on April 17, 2013 10:00 PM at 10:00 pm said:
    
    I think you mean the default graphics are awful (which they really are). The thing that puzzles me is, why does Excel get such a bad rap for this, when the same is true for pretty much any statistics software available? Just like with other software you can make some decent plots with Excel that look as good as most of what is done with R, Stata, SAS etc. Is it just that bad graphics made using Excel defaults are so common, because so many people who don’t know better use them?
    - Andrew on April 17, 2013 10:05 PM at 10:05 pm said:
      
      Shaun:
      
      The default graphics in R are not so great but in my experience they’re much better than what I see people making in Excel.
    - Shaun on April 17, 2013 10:21 PM at 10:21 pm said:
      
      I agree, a lot of people make a lot of rubbish graphics in Excel. However, its probably easier to make good graphics in Excel than dedicated statistical software. Unfortunately most users appear too lazy (or ignorant) to utilise the options available to them. This isn’t the fault of the software, though.
    - Wayne on April 17, 2013 10:22 PM at 10:22 pm said:
      
      Which I believe is because R’s defaults tend toward simplicity and clarity, while Excel’s are towards what “looks cool”. Given that the individual elements are rendered well, there’s power in simplicity.
      
      Some of this goes back the the whole Powerpoint presentation versus Tufte idea. Powerpoint is a catalyst for boring talks that lean on reading slides instead of presenting interesting information. In like manner, “presentation graphics” that might go into a Powerpoint presentation are also mistakenly made to “be interesting” and hence fail at a fundamental level. It’s a Suite thing.
    - Shaun on April 17, 2013 10:29 PM at 10:29 pm said:
      
      Except Excel’s defaults don’t “look cool”. They look terrible, like most other defaults. Too big, terrible colours, too many labels along the axes. Bad, bad, bad; but not unique to Excel and not forced upon the user.
    - Wayne on April 18, 2013 10:57 AM at 10:57 am said:
      
      “Cool” is in the eye of the beholder. In this case “cool” is defined by bean counters and Microsoft engineers. By people who make typical Powerpoint presentations. Not you or me. That’s why they feature 3D pie graphs, 3D ribbon charts, etc.
      
      By definition, “default” is not forced on the user. It’s what you get if you don’t specify something else.
    - Antonio on May 8, 2013 5:56 PM at 5:56 pm said:
      
      http://www.prodomosua.eu/ppage02.html
    - Shaun on April 17, 2013 10:26 PM at 10:26 pm said:
      
      And despite my (limited) defence of Excel’s graphics capability, I have no idea what economists were doing using it for statistical analysis. Admittedly their methods were fairly simply, so dedicated software may have been overkill. Then again, perhaps considering the importance of the topic they should have been using a more sophisticated approach (predicted growth controlling for debt AND other factors with confidence intervals would have been a good start).
  - icedwater on April 18, 2013 4:01 AM at 4:01 am said:
    
    Sidebar on Excel and graphics: http://japanesevw.blogspot.com/2013/04/japanese-old-man-draws-splendid-picture.html
Wayne on April 17, 2013 5:17 PM at 5:17 pm said:

What I find disturbing about Excel is that it shows all of your results and hides all of your calculations. The opposite of something like R code.

There is some capability to give you a bigger picture of your calculations, but…
- OneEyedMan on April 17, 2013 5:23 PM at 5:23 pm said:
  
  You can toggle excel back and forth between showing formulas and showing results. In the ribbon for Formulas, click on the button “Show Formulas” to show formulas and again to return to showing results.
  - Wayne on April 17, 2013 6:12 PM at 6:12 pm said:
    
    A good tip! But I think the problem with Show Formulas is that it depends on rows and columns being named intelligently — though the same might be said for R variable names — and you really need to also use “Show Precedents”, which appears to not toggle as you select various cells. That’s one thing I really like about Apple’s Numbers: click and you see the various source cells highlighted in different colors.
    - Deramin on July 23, 2013 3:30 PM at 3:30 pm said:
      
      Code in any language is only as thoughtful as its writer. Excel does give you the ability to do things in a reasonable and sane manner (Named references, structured references using tables, etc), most people just don’t bother learning that part. Cell high-lighting is called Trace Precedents and it’s under the Formulas tab.
      
      I’m really curious how Excel 2013 with the new Analysis ToolPak fairs against statistics-specific packages. It sounds like Microsoft might be seeing the error of their ways, but how knows what that’s good for.
Chris on April 17, 2013 5:28 PM at 5:28 pm said:

Excel “remain(s) so crappy” because its not designed as a scientific analysis tool. We are not the target audience for Excel. It and the rest of the Office suite is tailored to satisfy the needs of corporate infrastructure; its a place to do business calculations and generate business graphics for reports, not to perform precise statistical calculations or build reproducible models. You do so at your peril.
- conchis on April 17, 2013 7:23 PM at 7:23 pm said:
  
  This. Excel isn’t built for statistics, but it’s actually pretty good at what it does. (Which includes building reproducible models, just not *statistical* models).
Daniel Lakeland on April 17, 2013 5:51 PM at 5:51 pm said:

The main use for Excel is designing bridges and building structures. This is why we use such large factors of safety in Civil Engineering formulas.

I wish I were less tongue in cheek. They really do use a ton of opaque custom Excel worksheets in most engineering offices.
Steve Sailer on April 17, 2013 6:01 PM at 6:01 pm said:

I built sales forecasting systems using only the most rudimentary of statistical tools (e.g., weighted averages) over a couple of decades ago, first in the late Lotus 123 3.0 and second in Excel. Lotus was vastly better for sales forecasting, which you might think was a core mission of Excel. I haven’t seen much evidence that Excel today is as good as 123 3.0 was in the early 1990s.

Similarly, around 1995, Microsoft hired James Fallows, the talented magazine writer, to help improve on the spelling and grammar checker in Microsoft Word. We had long email discussions of style-checking features that could be added to Word. Nothing ever came of it and Fallows returned to The Atlantic.

The simplest explanation for this is the MS Office is close to a monopoly, so MS has little incentive to improve its Office products.
- Entsophy on April 17, 2013 6:28 PM at 6:28 pm said:
  
  Joel Spolsky of “Joel on Software” fame who headed the development of VBA had this relevant observation which explains a lot of this:
  
  “Everybody thought of Excel as a financial modeling application. It was used for creating calculation models with formulas and stuff. You would put in your assumptions and then calculate things like “if interest rates go up by 0.00001% next year, what percentage of Las Vegas homeowners will plunge into bankruptcy?” For example.
  
  Round about 1993 a couple of us went on customer visits to see how people were using Excel.
  
  We found a fellow whose entire job consisted of maintaining the “number of injuries this week” spreadsheet for a large, highly-regulated utility.
  
  Once a week, he opened an Excel spreadsheet which listed ten facilities, containing the name of the facilities and the number 0, which indicated that were 0 injuries that week. (They never had injuries).
  
  He typed the current date in the top of the spreadsheet, printed a copy, put it in a three-ring binder, and that was pretty much his whole, entire job. It was kind of sad. He took two lunch breaks a day. I would too, if that was my whole job.
  
  Over the next two weeks we visited dozens of Excel customers, and did not see anyone using Excel to actually perform what you would call “calculations.” Almost all of them were using Excel because it was a convenient way to create a table….
  
  What was I talking about? Oh yeah… most people just used Excel to make lists. Suddenly we understood why Lotus Improv, which was this fancy futuristic spreadsheet that was going to make Excel obsolete, had failed completely: because it was great at calculations, but terrible at creating tables, and everyone was using Excel for tables, not calculations.
  
  Bing! A light went off in my head.
  
  The great horizontal killer applications are actually just fancy data structures.
  
  Spreadsheets are not just tools for doing “what-if” analysis. They provide a specific data structure: a table. Most Excel users never enter a formula. They use Excel when they need a table. The gridlines are the most important feature of Excel, not recalc.”
  
  http://www.joelonsoftware.com/items/2012/01/06.html
  - Chris G on April 17, 2013 8:20 PM at 8:20 pm said:
    
    I think Spolsky’s “tables” observation probably explains it.
  - Steve Sailer on April 17, 2013 8:21 PM at 8:21 pm said:
    
    Right, spreadsheets are a great way to keep track of stuff in two dimensions. For example, a budget with line items on the vertical dimension and departments on the horizontal dimension.
    
    So, around 1990, Lotus introduced 123 3.0, which seamlessly integrated a third dimension. This was extremely useful in standard corporate uses, such as as building a 13 sheet workbook, one for each month’s budget and one summing up the annual budget based on the monthly budget. 123 3.0 made it simple to do anything you could do in two dimensions, such as graphing, in three dimension.
    
    I built a big sales forecasting system for a corporation using Lotus 123 3 in 1991. A couple of years later, I was hired by first firm’s archrival to build the same system, only this time it had to be in Excel. It took three times as long to build in Excel a second time as in Lotus 3 the first time. Excel has workbooks with separate sheets, but almost no effort was put into making them seamlessly integrated along a third dimension.
Steve Sailer on April 17, 2013 6:04 PM at 6:04 pm said:

This fiasco reminds me of Steven D. Levitt’s famous abortion-cut-crime theory, which turned out to be the result of his own programming errors. Here’s the 2005 Wall Street Journal article revealing Foote and Goetz’s attempt at replication of the most celebrated aspect of “Freakonomics.”

http://online.wsj.com/public/article/SB113314261192407815-7O0CuSR0RArhWpc9pxaKd_paZU0_20051228.html?mod=tff_article

What tool did Levitt (mis)use?
- Andrew on April 17, 2013 6:06 PM at 6:06 pm said:
  
  Hey—I thought the sumo wrestling story was the most celebrated part of that book!
- Anonymous on April 18, 2013 8:38 PM at 8:38 pm said:
  
  My guess is he used Stata…
Entsophy on April 17, 2013 6:21 PM at 6:21 pm said:

RExcel has worked pretty well for taking quick looks at data. You can do quick data cleaning in Excel and all the statistics in R. The only problem with it is that it seems to be limited to 34 bit versions.
Foster Boondoggle on April 17, 2013 6:34 PM at 6:34 pm said:

Just to amplify on what Chris said. Unlike statistical software packages, Excel has a relatively easy learning curve (though there’s a lot of power in there once you get into the weeds). It’s easy to quickly produce various calculations that might come up in business, etc. And, for better or worse, you can hand off your spreadsheet to someone else and have them using and modifying it very quickly.

There’s certainly plenty of risk due to the opacity of how it shows calculations. But ultimately perhaps no worse than with other programming languages. It’s just that Excel is ubiquitous in the business world, so there are many more opportunities for high profile debacles. But there have been plenty of other non-Excel screwups due to programming errors. Nicola Bradley at Algorithmics (now part of IBM) put together a nice collection of them — mostly from the finance world — in their March 2012 newsletter. They include coding errors related to Moody’s ratings of CDOs in the bubble years and AXA Rosenberg’s problem with their risk model in 2010.

Like the saying goes: “To Err is Human. To Really Screw Up, You Need a Computer”. And any old software will do.

Definitely surprising though to see quantitative academics using Excel…
- Denny on April 18, 2013 5:50 PM at 5:50 pm said:
  
  @Foster: Is there a way you can provide a link to the March 2012 newsletter on the collection of financial debacles. Thanks.
  - foster boondoggle on April 20, 2013 9:58 AM at 9:58 am said:
    
    If you reply with an email I can send it to you. I got it from a friend. It may not be on their site.
David on April 17, 2013 6:42 PM at 6:42 pm said:

The newer versions of excel actual have some nice sub-table features (so that parts of a sheet can be marked as a subtable and lookups can be into the subtable), but most people don’t know how to use them and they are rather inconvenient.
- Steve Sailer on April 17, 2013 8:23 PM at 8:23 pm said:
  
  True, MS continues to provide a trickle of innovation, but nothing compared to what we saw when there was actual competition in the field.
Stuart Buck on April 17, 2013 8:53 PM at 8:53 pm said:

A good time to revisit this 2008 paper on Excel: http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelreg.pdf
SBH on April 17, 2013 9:27 PM at 9:27 pm said:

Just last week an email went around Statalist talking about the dangers of using Excel for statistical work:

Thought this would be useful to know. Background: NIST (National
Institute of Standards) provides a suite of tests
(http://www.nist.gov/itl/sed/gsg/strd.cfm) that push the limits of
computational accuracy in routine statistical computation, e.g.,
having a regression model with a poorly conditioned matrix, or a
nonlinear regression problem with a sharp long ridge near the maximum
of the objective function that a poorly calibrated optimization
algorithm may miss. More serious software packages use these data sets
to gauge how they are doing against them
(http://www.stata.com/support/cert/nist/)… but Excel developers seem
to be oblivious to these standards.

And this:

Apologies if this has been circulated already and I missed it, but I
found this article about serious weaknesses in Excel’s statistical
functionality rather alarming. This looks to go well beyond “rare bug”
type issues: for instance, when given 27 NIST standard test problems
on nonlinear least squares, it gave badly wrong answers for 11 (but
still claimed to have converged to a correct solution).

http://www.pages.drexel.edu/~bdm25/excel2007.pdf

The article is based on Excel 2007, but it mentions that some of those
problems had been identified > 15 years earlier and remained unfixed.
brian g on April 17, 2013 9:31 PM at 9:31 pm said:

excel is awesome at what it does, which is organize and manipulate “large” amounts of data. it provides a tactile connection to data that is difficult to replicate in a database setting. and it is convenient to perform arithmetic calculations and build simple graphics in the same program. it also facilitates the transfer of results among users with varying levels of technical expertise. (and yes, it makes tables.)

i would not trade excel for anything. nor would i ever try to use it to perform a regression. much like i would never write a book in powerpoint. or race my honda in the indy 500.

no reason to bash excel, just accept that it doesn’t serve your purposes.
- Wayne on April 17, 2013 10:28 PM at 10:28 pm said:
  
  Yes, but if you used Powerpoint to write a book, no one (excpting you of course) would be damaged in the process. Wonky Excel spreadsheets, on the other hand, can have severe real-world effects. The reason this topic came around again was because of famous economists making fairly substantial mistakes that have real-world implications once turned into policies. It makes me wonder how many people have been mistakenly laid off because of Excel errors.
  
  On a sidenote, I actually got a brochure for a charitable organization that was done in Powerpoint. I assume that the marketing person who made the brochure worked all the time in Powerpoint and hadn’t a clue about Word, so they used their screwdriver as a hammer.
whoa on April 17, 2013 10:33 PM at 10:33 pm said:

Would echo what was said many times past that Excel is popular for tables. As far as functions, the most and only important thing is backwards compatibility — Microsoft is really the only company completely dedicated to backwards compatibility. When you have complex financial companies that have been in business creating software or encoding formulas, rigorously defining complexity however it is, the only important thing is that all those years are work are not silently invalidated or that they develop strange edge cases when you aren’t looking.

See apple and google’s type of operating system, and apps are lucky to remain compatible for a whole year. Open source OSs are totally splintered. It’s easy to take for granted backwards compatibility, or just chuck it aside if you are some one man show that just needs to relearn his little part for the next project. This is not so easy to do if your existing systems represent the accumulation of decades of work of thousands of employees.

If you want to seriously create or manipulate data, you use a real programming language. If you want to then display that data in a tabular format that people recognize, you can put it in excel.
- Scott on April 17, 2013 11:17 PM at 11:17 pm said:
  
  “the most and only important thing is backwards compatibility” Res ipsa loquitur.
Tom on April 18, 2013 12:20 AM at 12:20 am said:

I’d echo the comments on excel being used for data display rather than analysis. Generally a lot of people want to organise data so they can put it in a report rather than analyse it. For huge numbers of people excel is viewed as a *really* good tool. The fact that excel encourages poor data organisation, analysis and visualisation is really only known to those who use more appropriate tools for this.
loveactuary on April 18, 2013 7:26 AM at 7:26 am said:

I can imagine Excel is not reliable for professional statisticians, but for finance folk it is the foremost tool in our kit.

Because a valuable tool can be misused is not the fault of the tool.

It takes years (maybe approaching Malcolm Gladwell’s magic 10k mark) for actuarial analysts to get proficient in Excel to where the work is organized, replicable, and produces reports and results that are meaningful. The visual display of financial models – to be able to change various inputs and watch all dependent results fluctuate instantly on the screen in front of you (imagine a string of cash flow projections) – is critical to our business. There is no other tool remotely in this ballpark.

Actuaries and insurance companies have evolved to more secure and less error-prone software (MoSes, MG-ALFA, AXIS, Prophet) for core actuarial functions such as pricing, valuation, and standard projections. Even then, in general we have to show that these software produce valid results by … replicating in Excel :)
- Nick on April 18, 2013 10:10 AM at 10:10 am said:
  
  And it shouldn’t be the foremost tool in finance either. It is able to store data and perform calculations, and therefore ends up getting used as a combination of database and analytical tool, but does neither very well. I know people argue that it is a good tool for prototyping but I don’t buy that either; although it is flexible enough to quickly change the analysis on the fly, it is very easy to break things in imperceptible ways when you make any change to a spreadsheet. The error in R&R is exactly the kind of thing that you get when you change the number of data lines that you are looking at.
  
  While I agree that in general you blame the user, not the tool, the fact that Excel gets “abused” in so many different ways in so many different settings leads me to believe that there is something more fundamental going on. It’s like the teapot with the handle and spout on the same side on the cover of “The Design of Everyday Things.” It just isn’t designed well.
  - Rahul on April 18, 2013 11:02 AM at 11:02 am said:
    
    Do you have a better teapot in mind?
    - Nick on April 18, 2013 11:38 AM at 11:38 am said:
      
      It depends on what you are doing. For a one-person analysis you should be using a scripting language (e.g. R, Python) with a database. You can look at the script that cleans the data going into the database, you can ensure uniqueness where necessary in the database, and you can look at the analytical scripts to ensure that they are doing what they should be doing. The important needs here are data quality and fast and flexible analytics.
      
      For analytical tools that need to be used by several people, I think web apps are the right tools. You can centralize the code, eliminating the proliferation and forking that occurs as spreadsheets spread throughout the organization. You can customize the input forms to reduce data entry error, and can store whatever you want in the back-end database. The important needs here are consistency and organization.
      
      For truly scratch work type of analyses, use a calculator and a piece of paper. If you need more permanence than that, then it’s not really scratch work :)
      
      The pushback I get from spreadsheet defenders is that spreadsheets are good tools when you don’t really need to clean up your data, and when the analysis to be done is lightweight, and if the results are not intended to be permanent. An analysis that satisfies that criteria doesn’t seem worth the opportunity cost of the time someone has to spend building it.
  - J. Rusnak on April 18, 2013 12:35 PM at 12:35 pm said:
    
    Excel makes fudging numbers in your trading book to hide loses especially convenient. Even better if risk management pulls prices from an Excel sheet that you have access to.
Andrew on April 18, 2013 10:23 AM at 10:23 am said:

Here’s what I think of Excel.

I use to use Excel a lot (everday 8 hours). Then I learnt R. Now if I need a spreadsheet I use Gnumeric or LibreCalc.
loveactuary on April 18, 2013 11:16 AM at 11:16 am said:

Andrew (not Gelman, I presume, as there is no hyperlink in your name),

R is necessary for certain uses and spreadsheets are necessary for others. I have opened up Gnumeric and played around for a bit; what benefits do you find (in addition to its $0 price tag) in this application over Excel?
Lord on April 18, 2013 11:50 AM at 11:50 am said:

Accounting has double entry. If users programmed Excel with dual calculations, cross checks, and validations they could much better, but its casual use makes avoiding these easy and few want to take the time or make the effort.
- Hank Roberts on April 25, 2013 1:16 PM at 1:16 pm said:
  
  > If users programmed Excel with dual calculations, cross checks, and validations …
  
  My beloved makes a decent living this way — dropping into one business after another to temporarily provide Excel expertise. She routinely makes it part of her task create and use, and teach, these kinds of cross-checks.
  
  (Not to mention improvements like eliminating steps where someone reads numbers off a computer screen and types them into an Excel window — often they’ve never figured out how to copy and paste successfully, or they trust their typing more than they trust the computer — hey, nobody proofreads any longer ….)
K? O'Rourke on April 18, 2013 12:10 PM at 12:10 pm said:

Here’s what I am currently wondering about Excel.

Can it safely be used to increase potential audience?

For instance, it’s easy to explain certain things to folks who get basic R code (9 out of 11 who filled out an evaluation agreed).

But the folks who would most benefit (I did not say appreciate) are not going to learn R but already (mis)use Excel.

Thanks to Steve Sailer I no longer think it should be as easy as what I did with Lotus 123 many years ago :-(

As an aside, an academic who wishes to maximize publications and reputation (assuming no real penalty for publishing claims based on computational errors) should use Excel as these kinds of errors often lead to interesting, highly publishable claims ;-)
Pingback: Somewhere else, part 47 | Freakonometrics
Pingback: An argument for why careful coding in R is better than an Excel spreadsheet | Sam Clifford
Fernando on April 19, 2013 10:47 AM at 10:47 am said:

Some time ago I became concerned about the way researchers involved in randomized program evaluations where generating random numbers. In particular most were using Excel. At the time I did a literature search and came up with this:

author = {Yalta, A.T.},
title = {The accuracy of statistical distributions in Microsoft® Excel 2007},
journal = {Computational Statistics and Data Analysis},

author = {B.D. McCullough and David A. Heiser},
title = {On the accuracy of statistical procedures in Microsoft Excel 2007},
journal = {Computational Statistics \& Data Analysis},

author = {B.D. McCullough},
title = {Special section on Microsoft Excel 2007},
journal = {Computational Statistics \& Data Analysis},

author = {Altman,Micah and McDonald,Michaell P.},
title = {Choosing Reliable Statistical Software},
journal = {PS: Political Science \& Politics},

author = {B. D. McCullough and Berry Wilson},
title = {On the accuracy of statistical procedures in Microsoft Excel 97},
journal = {Computational Statistics \& Data Analysis},

Enjoy!
- Fernando on April 19, 2013 11:13 AM at 11:13 am said:
  
  PS from the first paper above:
  
  The random number generator has always been inadequate. With Excel 2003, Microsoft attempted to implement the Wichmann–Hill generator and failed to implement it correctly. The “fixed” version appears in Excel 2007 but this “fix” was done incorrectly. Microsoft has twice failed to implement correctly the dozen lines of code that constitute the Wichmann-Hill generator;this is something that any undergraduate computer science major should be able to do.The Excel random number generator does not fulfill the basic requirements for a random number generator to be used for scientific purposes: (1) it is not known to pass standard randomness tests, e.g., L’Ecuyer and Simard’s (2007) CRUSH tests (these supersede Marsaglia’s (1996) DIEHARD tests—see Altman et al. (2004) for a comparison); (2) it is not known to produce numbers that are approximately independent in a moderate number of dimensions; (3) it has an unknown period length; and (4) it is not reproducible. For further discussion of these points, see the accompanying article by McCullough (2008); the performance of Excel 2007 in this area is inadequate.
  - jrkrideau on April 19, 2013 1:26 PM at 1:26 pm said:
    
    From a recent discussion on the R-help mailing list.
    —————————————————–
    “The idea that the Excel solver “has a good reputation for being fast andaccurate” does not withstand an examination of the Excel solver’s ability to solve the StRD nls test problems. Solver’s ability is abysmal. 13 of 27 “answers” have zero accurate digits, and three more have fewer than two accurate digits — and this is after tuning the
    solver to get a good answer.
    …
    Excel solver does have the virture that it will always produce an answer, albeit one with zero accurate digits.
    Bruce McCullough
    http://r.789695.n4.nabble.com/Re-NLS-results-different-from-Excel-td4659167.html
    
    B. D. McCullough and Berry Wilson (2002).”On the Accuracy of Statistical Procedures in Microsoft Excel 2000 and Excel XP,” /Computational Statistics and Data Analysis/ *40*(4), 713-721
    ——————————————————
    While it is fun to knock Excel I think it would be better to suggest that any spreadsheet of maybe one sheet and twenty lines of data is suspect. I have this recurring nightmare of someone using a spreadsheet to calculate a drug dose and accidently using my telephone number in col B rather than my weigh in col C.
    
    Readers with strong nerves may want to see http://www.eusprig.org/horror-stories.htm.
    - Fernando on April 19, 2013 1:45 PM at 1:45 pm said:
      
      I think the problem extends beyond Excel. Many social scientist want interesting research that has real impact in peoples’ lives. Yet, in my own very limited view, I sense very few incentives on quality control.
      
      Of course everyone thinks reliability is important, and we all put effort into it, but journals are not that interested in the topic, and the subject is hidden away in the laundry pile. After all, who wants to look into the kitchen sink when we can behold the latest Crab Imperial?
      
      Research practice — or the study and assessment of tools and lab practices with a view to improving the reliability, replicability, and integrity of research findings — needs to become a discipline in the social sciences. The more so if we want to be a consequential science with practical implications.
      
      Methodology, in my opinion, is much more than newfangled estimators. It is 90% craft, and that craft needs to be made explicit, systematized, published, and researched.
Blaise Egan on April 19, 2013 2:23 PM at 2:23 pm said:

I have the strong impression that Excel is written by people who are very conversant in Visual Basic but who have never encountered any mathematics beyond high school algebra. Things like numerical analysis are completely unknown to them. I can imagine that they heard about certain statistical techniques like regression from user requests for new features. When considering how to implement it they went to the library and found a book that had the formula and just implemented it as is. To this day you can put 3E30 into three adjacent cells and tell Excel to compute the variance and it will give you a number of the order of 10^28. A lot of bad bugs were fixed in Excel 2003 but many remained. None were fixed for Excel 2007, as far as I can see, as they were too busy making dramatic changes to the user menus. Many more bugs have been fixed in Excel 2010 and we now have a working random number generator. There is progress, but it is slow and far too many people assume that if there is a feature there it has been implemented correctly.
- jrkrideau on April 20, 2013 9:13 AM at 9:13 am said:
  
  Well I have heard of Excel giving a negative R^2 so you may well be right.
  
  In general, rather than just in the research area I think a (the?) major problem is that Excel (and other spreadsheets) are being used for purposes far beyond what a spreadsheet was originally intended for.
  
  It originally was an electronic version of a large paper page, actually I think two pages, in a big ledger, filled in by Bob Cratchet using a quill pen.
  
  Now we have lots of people in industry using them to do complex financial modeling, engineering calculations, psuedo-databases and who knows what else.
  
  It can be frightening to have someone ask how to do a SQL Join operation on a sheet with 500,000 rows of data.
  
  It seems to be a case of ‘when your only tool is a hammer, every problem looks like a nail” and most people just do not know that there are better tools available. I have to admit that a few years ago I was doing Gantt type charts in a spreadsheet, mea culpa.
Daniel Lakeland on April 20, 2013 4:30 PM at 4:30 pm said:

For what it’s worth, it seems to me that Gnumeric is implemented by people who actually want to do real reliable calculations. For example it uses the Mersenne Twister RNG, it uses R as a library to do many of its statistical calculations. It still doesn’t solve the basic problem that you’re working with a spreadsheet (ie. nonrepeatable, opaque analysis) but if you MUST use a spreadsheet to do some calculation, please use Gnumeric, available on Linux and Windows. There’s some kind of MacOS port but not sure where or how to get it.
Manuel on April 23, 2013 10:13 AM at 10:13 am said:

Hi. What an interesting post and blog,
I have just graduated as a mathematician and I’ve started working at a RF metrology group, in what it would be the equivalent of the NIST (National Institute of Standards and Technology) in my country.
The thing is all my co workers use Excel for statistical calculations, but that’s quite limitating and frustating. Does anyone know any publication (preferably by the NIST,PTB, or any national institute) that recomends using R or Pyhton instead of Excel for statistics calculations? It would be of great help.
Pingback: In Praise of Excel | Compromise and Conceit
Pingback: Braveheart & Rogaine « MetaSD
Pingback: The Week That Was – 4/22/2013

Comments are closed.