## Designing a study to see if “the 10x programmer” is a real thing

Lorin H. writes:

One big question in the world of software engineering is: how much variation is there in productivity across programmers? (If you google for “10x programmer” you’ll see lots of hits).

Let’s say I wanted to explore this research question with a simple study. Choose a set of participants at random from a population of programmers. The participants will write a computer program according to a specification. We’ll measure how long it takes for them to complete a correct program.

(Let’s put aside for now the difficulties of using task completion time alone as a measure of productivity, or the difficulty of verifying that a program is “correct”).

I know that I can estimate the population variance from the sample variance. But the problem is that there’s some “noise” in this measurement: there are factors other than their underlying ability that can affect how they perform: maybe some of them skipped lunch, or they’re just having a bad day.

My question is: how do I design my studies and analysis to try to identify the amount of variation that represents “individual ability” rather than these other factors that I can’t control for?

My reply: I’d think the thing to do would be to give multiple tasks to each programmer and then fit a model allowing for varying task difficulties, varying programming abilities, and task*programmer interaction. An additive model (on the log scale) with interactions should work just fine, I’d think.

1. Rahul says:

Why not just average over four days or something? Kinda unlikely that he skipped lunch on all four days you randomly chose?

• Phil says:

Sure, it’s always better to have more data. If you have a bunch of programmers who are willing to spend an hour or two per day for a couple of weeks, or an hour or two once per week for a few months, you can do a lot better than if each programmer only has one 1- to 2-hour session or whatever.

I think there are several sources of variation and it would be good to think about the size of each and design your experiment accordingly.

Lots of variation:
From programmer to programmer for a given task
From programming task to programming task for any individual programmer (everyone is better at some things than at others).

Medium variation:
From start of the workday to end of the workday

A little bit of variation:
From day to day for a given programmer

If I’m right then I’m not so concerned about whether the programmer had lunch that day. I’d want to give each programmer several tasks that require different skills, and I’d want all of the participants to not be too tired from hours of concentrating on work or school. (Actually there is interesting research to be done on this issue. I think I’m above average at many analytical tasks if I do them while I’m fresh, but that I can’t concentrate for as long as a lot of people do. If you give me a few short tasks to do I might look like a star, but for a larger effort a relative tortoise will beat me in spite of my hare-like start).

2. George says:

If you know what these “other factors” are, then you can probably make some sensible assumptions (no guarantees of correctness here), and/or design adaptations; e.g. administer test during different times of day to account for variation of performance due to time of day. If you don’t know what these “other factors” are, there is not much you can do . I would first try to find out what programmers think affects their performance (perhaps by asking a number of them) and then build a design that accounts for these factors. Its better to address confounding in the design phase than in the analysis.

3. babar says:

your experiment should include some concept of negative productivity, of making the overall situation worse for others. this makes the question “10x” kind of moot (if someone can be negatively productive, then someone can be just more than not productive, and so someone who is good can be much much better).

• Paul says:

+1 on recognizing negative productivity. Our lab once hired one of these guys for a semester. We spent the next semester undoing everything he ‘did’.

4. There are a few other issues to consider. First, it may not be the scale of variation as much as the shape of the distribution. At a minimum fit a t distribution and let degrees of freedom be a parameter.

Second in my experience the 10x effect is most likely to happen in large complex projects where there is a straightforward but inefficient way to design a system and there is also a higher level of abstraction that is more difficult to take advantage of but much easier to solve the problem with if you know how.

For example people who know how to use metaprogramming tools will write a parser in a few minutes that another programmer would take a week of stream manipulation code to write.

So you are unlikely to find a 10x type effect on “toy” problems that take less than a month to do in the straightforward way

>Let’s put aside for now the difficulties of using task completion time alone as a measure of productivity, or the difficulty of verifying that a program is “correct”

With respect, I feel the testing and verification processes shouldn’t be set aside from consideration and discussion of your study design. Otherwise, how different really is your study from a typing-speed test? I think a good analogy would be measuring the time variation among Rubik’s cube solvers, where you have near-instant visual confirmation of a correct solution. (BTW, we now see several orders of magnitude in times to completion. Rubik himself took about a month, while “speed-cubers” can solve within ~10 seconds.)

If you use one of the X-unit type frameworks, you can very quickly run each subject’s solution through the test harness and check for correctness. This is what Mathworks does in their ongoing Cody programming contests. Imposing this kind of rigor on your study should allow those various effects you mentioned (mood, etc) to come through more clearly, I would think.

• Rahul says:

The Rubiks cube record times are amazing. The record is like 6 secs. I can never understand this.

How many moves can one even humanely do in 6 secs?! Leave aside perception times, cognition lag, reaction times etc.

6. Grobstein says:

The 10x legend originates from actual experimental work starting in the 1960s. A good overview of the studies can be found here.

7. Matt S says:

I think it’s very unlikely a meaningful study can easily be designed, at least not without being very expensive. Much higher programmer efficiency for some individuals is mostly due to them being able to reduce and deal with very complex software / requirements.

I’m sure there is a lot of time variability in a 1-2 hour toy programming tasks, but I would think the variability has a lot more to do with whether the programmer happened to deal with very similar problems or if its poorly specified just how well they can solve puzzles.

Anyway that is just my perspective as a long-term software developer. The slowest and most time wasting people work with poor or unclear specifications and insufficient error checking/correction on very complex tasks where there definitely will be errors.

8. jonathan says:

Productivity in known or unknown tasks? Those are two different domains. You can test for productivity in known tasks by assigning tasks. I don’t see how this solves for productivity in unknown tasks and unknowns can be in design, in implementation, in actual unknowns until you run into them. If you test for knowns, then you should also consider efficiency of the code as part of productivity; x may be able to get through an issue quickly but the code may cause issues in other areas, may use resources inefficiently, etc. You’d need to be careful designing this testing.

9. D.O. says:

I think it turns out to be an interesting stat problem after all. The problem, as I see it, is not to estimate the central tendency and variation with as little confounding as possible (though, the e-mail that started it does ask for this task), but how fat is the tail of the distribution, which is what 10x is after all. Of course, a very large sample size will cure it, but it might be inefficient. What is a better way? Maybe organize sort of survival tournament with escalating difficulty of tasks, select a group of say 5% “survivalists” and compare them on a final task against a random control group? I guess, biologists do experiments like that, when studying selection.

• Rahul says:

How is that different from just a flat contest provided your sample captured the 5% in both cases?

• D.O. says:

Less men-hours spent doing it. Or, from another angle, your initial sample can be much bigger if you are willing to narrow it down later.

10. question says:

“Designing a study to see if “the 10x programmer” is a real thing…I’d think the thing to do would be to give multiple tasks to each programmer and then fit a model allowing for varying task difficulties, varying programming abilities, and task*programmer interaction. An additive model (on the log scale) with interactions should work just fine, I’d think.”

Give multiple tasks to multiple programmers and if any meet the specs 10x faster than the others the answer is yes. If it doesn’t happen perhaps you just didn’t happen to have one in your study.

• nah says:

Well, there are problems with this… If you give problem(s) to a programmer that they’ve solved in the past, it’s absolutely trivial for them to retype a program. At least the time and effort is trivial as compared to what programmers typically spend their time doing, which is coming up with new (to them) solutions to unique problems.

I mean, if you ask a database developer to solve a simple SQL problem and he gives you an answer in 10 seconds, while some UI programmer takes 10 minutes to look up SQL usage specifics, it hardly means the DB developer is a 60x programmer. If you picked a UI problem instead, you might conclude the other developer is a 60x programmer instead.

This is aside from as some others have mentioned, the productivity of a software developer is a lot more than simply the code the write… The work is collaborative and sometimes open-ended.

11. Colin says:

A lot of the productivity may be down to the long term nature of a product.

A programmer may bang out a solution quickly, without much thought to long term architecture. But over time the design will likely cause more maintenance issues. A better software engineer may have seen many of the design choices before, and know where the likely pitfalls lie. He would likely start off a lot more slowly, but have far fewer maintenance issues over the lifetime of a product.

12. CartmanBrah says:

Hi,

My first thought is that the problem of designing your study is the same as designing an effective interview process, but your study is much more ambitious than that.
The “10x” concept is easy to immediately deconstruct because it is well known that a programmer’s ability is always relative to the task.
You want to make a generalization that eliminates the influence of task specific knowledge, which is impossible.
You could moderate your hypothesis to make it easier to test, for example, “Can I design a test that will identify a ’10x’ C++ programmer?”. One could easy imagine further restrictions, until your study is equivalent to an interview process designed for a specific job.
If you don’t like that and need something that is very general, then you might run the risk of designing something that looks like an IQ test. And with IQ tests, the problem remains: how do I decouple “intelligence” from the subject’s ability to quickly solve certain types of puzzles that I happened to put on the test? My answer is: you can’t.
Absolute determinations of “10x” and “intelligent” don’t really need to exist.

• Keith O'Rourke says:

> how do I decouple “intelligence” from the subject’s ability to quickly solve certain types of puzzles that I happened to put on the test?

That’s certainly a major challenging in assessing _important_ learning in a statistics course.

• nah says:

Yeah, I was really struck by how this was identical to the very difficult question of measuring programming aptitude at all during interview processes.

Just look to results Google has published about their own extremely expensive and time consuming prospective hiring process. Reasonable enough and popular tests often aren’t able to predict whether one of two programmers is even a slightly better programmer, let alone a “10x” better programmer.

I almost think it would be better to just do this as an observational survey, either of companies or testing centers existing hiring or performance metrics.

• bxg says:

> The “10x” concept is easy to immediately deconstruct because it is well known that a programmer’s ability is always relative to the task.

I don’t believe is either well known or even true, at least not when you are looking at the “superstar” extremes of programmers
and in real environments.

Your example of a 10x C++ is an especially bad one. If I interview someone and they establish themselves as one of the 10x category of programmers, it’s not going to matter what particular language they have used or they will be asked to use. They will learn what is necessary; the language is a shallow skill relative to everything else that goes into
high productivity, and it’s really not that important over anything but the very very short term.

When unsual or unusually-deep domain knowledge is involved it’s probably a different story, and perhaps sometimes
this involves the computer language as well (I wouldn’t assure a guru generic programmer will be an R-expert in a week) but
generally language proficiency is only a relevant decision variable for comparing “near average” programmers.

13. To qualify myself, I went to graduate school for Industrial-Organizational Psychology and now work in people analytics.

Lorin has forgotten that one of the reasons research uses random assignment is to solve this problem. Random assignment ensures the noise is randomly distributed. You can never completely remove noise in studies of human behavior. A study with random assignment, multiple treatments and a control group will eliminate any need to worry about noise.

As to the design of the study, you’re on the right track. Using only programming tasks to measure performance implicitly assumes that productivity and subsequent monetary gains from increased productivity are completely reliant on how fast and well a software engineer programs. However, there are other aspects of performance that could contribute to the productivity thus monetary gains, such as how well they review other’s code, whether or not they can coach other programmers to program better, whether or not they have cross-functional relationships they can leverage to improve the product or service the business offers etc. What you could do is design what is called an assessment center. Assessment centers are a job simulation usually lasting a half-day to full workday, and are often used in the public sector and to evaluate executive candidates.

14. Alex says:

Access to the source control databases of a few large companies might be a good start without having to do a study. You’d expect digging through all the checked in changes to be able to identify quite easily people who are clocking in ten times the changes over the rest. The data should I’d expect also cover being able to filter for rework caused by bad changes.

One of the troubles I’d expect with any such study is that different programmers are likely to be creating most of the difference through use of additional tools. If for example you already have your usual test infrastructure established and developed incrementally in that testing as you go versus merely hacking out code.

So stage one determine a range of developers using existing data. Randomly take them and in a given environment of available tools get them to produce solutions to an identical set of problems. You then need to test those solutions to eliminate for quality.

Then stage two give the group a new set of problems to develop in their own usual working configuration.

15. Let’s take a case study I had some experience with 12 years ago or so, Top Coder, because it’s representative (in my experience) of the overall trends and is easy to explain. It’s also an excellent source of obserational data and of really really great programming exercises and quiz questions if you want to become a better programmer or just hire one.

Top Coder runs (or at least ran) regular programming competitions with very well-defined problems and unit-tested answers. You could compete using your choice of Java, C++, or C#. Top Coder makes old competitions available online, so you can run them with timers and everything. They let anyone compete. The competitions came in two divisions, and each division provided three problems, with the hard problem in the first division being the easy problem in the second division, so there’s a total of five problems overall in order of increasing difficulty. Most programming quizzes of the kind given by the likes of Google, target what looked like Top Coder level 3 or level 4 problems in their interviews, and sometimes level 5. Most other companies stick to level 2 or 3 problems, or they’d wind up failing all their examinees.

The level 4 problems were challenging and usually involved recursion or its iterative unfolding and a couple levels of index fiddling. By the time you got to level 5, you almost always needed to not only fiddle indexes, but also crack out dynamic programming (caching/memoization) or other specialized graph or factoring algorithm to solve the problem in the allotted time. An example that I recall is to compute the shortest path through a maze. The maze was provided as a 2D character array, with o being the start, x being the end, and . being an impassable square and + being a passable square. You had to return the sequence of moves from the start to the end as coordinates (so there’s all the where’s the top left and how do I index from there problems). You couldn’t just do a blind search, because your answers were resource bound (like had to run in 2 seconds or something). Other problems were constraint-based things, so amounted to writing a Sudoku solver, which again couldn’t be done by trial-and-error but needed 8-queens-like strategies of depth-first-search with efficient backgracking to solve.

Most of the programmers in the higher tier competition couldn’t solve the level 4 difficulty problems in 90 minutes and very few people actually solved the level 5 difficulty problems in the allotted time (sometimes I did, but usually they’d take me three or four hours to get right and some I just gave up on, though probably could’ve tackled in a day or two). The top performerrs in the competition would solve all three hard problems in 10 or 15 minutes consistenty, competition after competition, including live competitions where they were brought to a different location (as hard as cooking in someone else’s kitchen).

So for well-defined problems among top programmers, there’s an easily measurable 10x effect that I saw week after week. But remember, most of the programmers were less than 1/10 the speed of the top programmers because they’d never be able to even solve the problems. And as babar points out above, sometimes a programmer’s contribution is negative.

• Krzysztof Sakrejda says:

Do you think the 10x thing is really relevant in practice (not from the perspective where we all like to see a genius at work but in terms of managing a small or large group of people working together on a programming project)? To me it looks like it’s more relevant to get the top 30% working well together and if you get somebody magical that’s icing on the cake. I’m not really sure why we focus on this top ability question.

• Phil says:

About twenty years ago I read the book “The Mythical Man-Month”, which I recommend to anyone and which I really should read again. My recollection is that the author (who was in charge of writing one of the first multi-user operating systems, back in the 1960s) had quite a bit to say about the 10x programmer. The author said there’s a tendency to put this person in charge of a big task and give him (or her) a team to manage. This impulse must be resisted, indeed you should do the exact opposite. You want your best programmers to do nothing but program. Hire someone else to sharpen their pencils (they used pencils in the 1960s), to answer their phones, to attend meetings for them. Just let them do what they do best. It’s especially hard to resist this temptation (the author said) because these programmers are _also_ better than average at other aspects of development, like overall system design. But they are not 10x at those things.

Some advice in Mythical Man-Month may be less relevant today — so much has changed about programming, and not just the languages — but at least at the time, the author suggested focusing on the 10x programmer because on a big team you will probably have two or three of these people and you want them to write absolutely as much code as they possibly can. Maybe in the ideal project you would hire 30 programmers, 3 of them would be superstars, and you’d have the three superstars write all of the production code while the other 27 people decide on the system architecture, manage the distributions, write documentation, write unit tests, and so on.

• I think it depends on what the project is. Getting super fast programmers isn’t as critical as hiring ones who can get the job done at the level of quality needed.

The other major factor is deciding what to do and what not to do, particularly deciding what not to do.

• Rahul says:

On really big & complex projects apart from code quality & speed what’s critical is making the right big / soft decisions e.g. what data structure, which methods to expose, where your interfaces lie, whether to break compatibility, the intuitiveness of your functions etc.

These are decisions outside your 10x raw speed framework but they can really make or break a project. It must be mighty hard to measure these skills in a quick test.

16. Sean Matthews says:

I’m surely repeating what people say above, but I doubt that this sort of study is sensible. In my experience a significant part of programmer productivity difference is in the ability of some programmers to tackle successfully problems where others would simply fail. Not in doing the same thing 10 times faster. I spent a significant part of my life (now in the past, alas) in major CS research departments. I encountered lots of people there who were _patently_ radically better than the people I encounter in commercial IT shops. But they were tackling non-comensurable problems. Such people do not, when they leave academia, seem to be interested in moving to commercial IT shops.

17. Sean Matthews says:

Hi, and I see that I am basically repeating what Bob Carpenter just wrote.
Must say that I would never attempt to tackle a hard problem in any of Java, C++ or C#.

• Ha! that was my thought as well (about the choice of languages). For the depth-first-search with backtracking type problems that Bob mentions I’d probably choose prolog.

18. Jack says:

This strikes me as a weird post. The 10x programmer thing comes, originally, from empirical data. It’s discussed in the book, Peopleware.

• Andrew says:

Jack:

There’s definitely been empirical research on the topic; for example, a quick Google search turns up this. But there are many variables here, and I don’t think it’s weird to raise the question of how to design new research. Nor do I think weird for my correspondent to want to explore these questions using a simple study. We can often learn by designing new studies and analyzing the resulting data.

• Rahul says:

Yes but it might still do some good to look at prior work before reinventing the wheel?

• Andrew says:

Rahul:

All I did is respond to an email query with some general advice about measuring variation in a population. This is not a reinvention of the wheel; it’s a pointer toward known techniques of wheel construction.

• Rahul says:

No not you, but the correspondent. IOW, all I’m saying is that the pointers to previous work are super useful. And one might benefit from reading up that before actually designing / doing any new analysis.

There’s a prior study on almost everything & everything one thinks of mostly someone else already has.

19. Tom Harrison says:

It is not lines of code. It is not numbers of bugs. It is not how fast (for how well matters more). It’s not how quickly one can solve a coding problem, or 10.

A “programmer” of the studies done in the 80s and 90s was a far different beast than today. I was one then, and am one now. Then I wrote algorithms and debugged memory allocation problems. I did battle with operating systems. Today I deal with a whole different set of the same problems — 90% of time spent determining why one thing that should work does not. Googling. Reading StackOverflow answers. Knowing when to ask. Making sure I am solving the right problem.

This week I had a simple new feature to implement. It wasn’t as simple as predicted. There were multiple edge cases. There were unsolvable aspects of the problem. There were human factors. There were tradeoffs between “close enough for now” and “do it right”. How many tests should I write? Who is the audience? What’s the risk if it fails?

I was thinking about this all week. I was doing many other things. I identified one solution on the train home. Another problem to consider in a morning shower. A possible super-simple solution as I was reading an unrelated blog post. On Friday afternoon, I committed code that used an existing library in a suitable way, adding no lines of executable code (and removing scores of lines of dead code no longer needed), and 20 lines of code in four tests. I wrote some doc about the scope of the problem and solution in the bug that had been opened, and referenced it in the commit message. I solved the whole problem, not just “got it to work”. I coded and tested for around 40 minutes. Little other of my time in the office was spent on this problem.

How is this measured? Compared to the three prior solutions that had all failed and cost our company time. Good will with customers. Frustration with the engineering team. And maintainability.

Even though my first software was written in the 70’s it’s still the case today that writing software is a holistic art of communication, thought, consideration, craftsmanship and delight. Technical skill is essential, of course, but it’s just one part.

I am not sure the question is any more answerable than “Who is the most productive artist?” Yet we all agree that Monet, The Beatles, Linus Torvaldis, and so few others are brilliant.

• Elin says:

+1

• Andrew says:

Tom:

Sure, but I think there are two situations being considered here.

You’re talking about multiple dimensions of problem-solving ability, and the idea that different people can be at different places on the efficient frontier by virtue of having different, perhaps complementary, sets of abilities.

But I think the original question is about the also-relevant observation that lots of people aren’t close to the efficient frontier.

To put it another way, forget about the “10X” programmer for a moment. Instead, label the best programmers for some job as having the ability “X.” And we’ll accept that “X” can manifest itself in different ways, some combination of communication, thought, craftsmanship, etc. Fine. But now the question is, are there “0.1X” programmers, people whose skills are at a much lower pitch, so they’re 1/10th as effective as the top programmers, by any measure? This is an interesting question, and I take it to be the question being asked by my original correspondent.

• D.O. says:

Don’t think so. 10x and 0.1x are completely different problems. Because what is being multiplied is some sort of a norm. Like average ability. So the 2 questions would be “are there people with much higher productivity/ability/expertise/aptitude/etc. than a typical professional?” and “are there people with much lower productivity/ability/expertise/aptitude/etc. than a typical professional that are still making living out of it?”

• Andrew says:

D.O.:

Let me say it another way, to be clear. Suppose the original hypothesis is that if programmer A has “typical” productivity (in some sense, defined among some population of professional programmers), that there’s an awesome programmer B out there who is 10 times as productive as the already-productive programmer A.

I’m just flipping it around: I’m supposing that there’s some awesome programmer B, and then the claim is that the typically-productive professional programmer A is 1/10th as productive as B. That is, in my comparison, my “norm” is awesome programmer B. I agree that it’s not particularly interesting to learn that there are programmers who are 1/10th as productive as programmer A.

• D.O. says:

Prof. Gelman, thank you for the reply. I think, I’ve got your meaning. But this “inversion” is interesting as a statistical (not logical) approach. Let’s take some fat-tailed distribution. OK, not programmers, let’s say wealth distribution in a country with high inequality (ahem). We can take a median household, multiply by 10 and call it “rich” and than have super-rich and hyper-rich and so forth. But if we try to take a few rich persons, take median and divide by 10 it would hardly be a meaningful procedure.

• Rahul says:

I think what Tom is saying is that the days when you could test relevant aspects of programmer ability by testing him on a toy problem are gone.

The original question rests critically on measurement of “how long it takes to complete a correct program.” If that criterion itself has become marginally relevant to actual performance on real projects the original question itself becomes moot.

20. Phil says:

Maybe this is a good thread to mention that I’m having some programming problems myself at the moment.

Although I do a lot of programming as part of my day-to-day work, the output of my research is nearly never the program itself: what I’m usually interested in is the outcome of some sort of statistical analysis or engineering analysis. The code that I generate, almost always in R, will usually never be seen by anyone but me, or maintained by anyone but me. I’ve learned over the years that I should try to be at least somewhat clean in the way I do things because I may come back to the same code months or years later when I want to do a new analysis, but my code is often kludgy or inefficient in various ways and that’s usually fine.

But right now I’m working on a project that whose code will probably be released open source, and we are making a website to demonstrate the code. My part of the project is to perform automated analysis of commercial building electricity data (at 15 minute or 1 hour intervals) in conjunction with weather data, to try to identify energy savings opportunities. For instance, most commercial buildings have heating, ventilating, and air conditioning systems that operate on a schedule. If they are operating for more hours per day than they need to, just resetting the schedule can save a lot of energy. So I have an algorithm that tries to determine when these systems are turning on and off each day. Simple stuff like that.

All of it is seems pretty straightforward but there are all kinds of special cases to worry about. Many of the issues are algorithmic details rather than programming problems per se, but there is a lot of overlap between these categories. A simple one is, How do you handle holidays or other outlier days.

An illustrative problem that I ran into is recognizing when a building turns on and turns off. My initial attempt was, for each day separately, find the first hour during the day that the building used substantially more electricity than in the previous hour. Call that the start time. Then find the last time during that day when the building uses more energy than in that first high hour, and call that the end of the day. This works just fine in most buildings, most of the time. But it does not work for buildings that shut down after midnight — movie theaters, some restaurants, some car rental places, etc.

I have many, many, many more examples. I wouldn’t quite call them “corner cases”, some of them are too common to be characterized that way. For better or for worse my standard procedure has been to make my code increasingly complicated to deal with this kind of thing. I’ll put in some if-then-elses to deal with different cases; loop through hours or days or months one at a time to compare them to their predecessors; have lots of conditions like if (currentLoad &gt previousLoad) &amp (currentLoad &gt 1.1*minLoadToday) &amp (maxLoadToday &gt 1.25 * minLoadToday) {….} and so on. At some point I finally end up with something that kinda-sorta works adequately. And then half the time I scrap it all and rewrite it so it takes 1/3 as much code and runs in 1/4 the time; the other half the time I live with it.

I think a good programmer would get it right the first time, or at least the second or third time. I forgive myself the struggles over the algorithm: to a large extent that just goes with the territory of coping with a high-dimensional space for which you don’t know what behavior you’ll see until you’ve looked at a lot of data. But I’m kind of ashamed at how inefficiently I end up writing a lot of the code.

• Phil says:

Huh, so many people have complained about greater-than and less-than signs being interpreted as html that I thought I was being smart to use \$gt and &amp. Ah, well, I tried.

• D.O. says:

Not a programmer here, but first thing that leaps out of the page is that if a “day” is reasonable concept that is important to your application, you should make a structure or an array or something like that with meaningful parameters and adjust them for each case (I assume you are not working in object-oriented language where that sort of thing is the first thing you do, but rather in some procedural language where you have to invent it every time). You can also make a “building type” structure (or whatever) to store your knowledge about what sorts of buildings you are dealing with. It will not solve problems with multiple comparisons, but at least will compartmentalize them.

• Phil says:

Yes, a day is a reasonable concept, and usually it makes sense to ask how the building behaves on Monday versus Sunday or whatever. And most commercial buildings are in their low-energy state during the night and early morning, so thinking of a day as running midnight to midnight makes sense. It is also super convenient in R because R’s time classes “know” what day of the week it is. You’re right, of course, that I can define my own days of the week so that, say, Monday runs from 6pm to 3am for a nightclub or whatever. I wouldn’t even say that’s hard to do. But it does mean changing the way I handle “day of the week” in a bunch of places. And it leaves me with the problem of designing an algorithm to recognize when the day should start…maybe that’s easy, like I could pick the whole hour that is most frequently the minimum-energy hour, but…well, this is exactly what I’m talking about: it’s easy to think of an idea that will capture most of the cases but not all of them, and this may well be an example.

So, you’re right that your idea is a good one…but that just makes it also a good example of a complexity that comes up, and that a real programmer would probably handle better than me.

• Elin says:

I think great programmers do better the first time than the rest of us, but they don’t by any means “get it right” the first time. In fact the great programmers I know are always thinking about how to do things better. It is no different than writing. http://stancarey.wordpress.com/2010/12/17/william-james-on-rewriting/ is a good post with this at the end “Ernest Hemingway told the Paris Review that he rewrote the last page of A Farewell to Arms 39 times before he was satisfied with it. Asked what the problem was, he replied, “Getting the words right.” “

• Fit a periodic function with period 24 hours to several days of data, find the minimum energy usage point of the periodic function, call that the “start time” for the building. Give every building its own start-time. Then energy usage on “day x” for building y is the energy used from the start time for building y that occurred on calendar day x, until the start time on the following day.

21. John Mashey says:

Back when Fred Brooks wrote the first edition of The Mythical Man-Month, he was keen on “Chief Programmer Teams”, using Harlan Mills as an example. In this case, the Chief tends to be architect and write a lot of the code.

At Bell Labs (which certainly had a few 10X or better programmers by any metric), we tended to think differently:

a) We had some terrific programmers, but it was so hard to find Chiefs that it made no sense to try to organize that way in general, and if Bell Labs didn’t think it could find many of them… that might be a problem.

b) We argued for building better tools, like automating away the Program Librarian job with Source Code Control System for configuration management, shell scripts for procedural automation, automated test harnesses, and especially re-use of existing code, and always working at the highest language level that was fast enough.

c) And then, try to keep teams as small as practical (which might be as big as a few hundred), but adapt the organizational structure to the people you had or thought you could get. Some great software was built with core groups of 2-3 people, with some helpers of various kinds, but that simply did not work for some of the big electronic switching machine software.

d) I sometimes lectured for the internal Software Project Management course, especially taken by supervisors and department heads accustomed to running hardware projects faced with increasing fractions of software work. One of the main points was to avoid applying the latest software engineering methodology fads band think instead about the differing types of projects, and organize appropriately. At one point, “top-down design” was a big buzzword, and sometimes it was a good idea, and sometimes caused projects to fail.

• Phil says:

I saw a talk about 25 years ago (!) by some guy from IBM who looked like Mr. Clean and who happened to be giving a talk about an experiment IBM had tried, called the “cleanroom” approach to computer programming. Ah, interesting, according to Wikipedia this approach was “originally developed by Harlan Mills and several of his colleagues including Alan Hevner at IBM.” But Alan Hevner does not look like Mr Clean so he’s not the one who gave the talk I saw.

Anyway, Mr Clean said IBM had used this approach to develop a telephone billing program or something, writing it from scratch. I remember a few aspects of the talk. He said the team writing the code wasn’t even given access to compilers: they didn’t want programmers relying on compiler errors or run-time errors to catch problems: no fixing your off-by-one problems through trial and error. The programmers would write code, sign off on it, and hand it off to another team that would compile it and run it against a bunch of test cases. The team writing the test cases was a different group from the ones writing the code. Several other unusual things too.

Mr Clean said the project was about typical in terms of how much it cost to deliver the software. But in the three or four years that had passed since the program had been put into use, they had experienced huge benefits in reduced maintenance costs and bug fixes. The program was so much cleaner than their typical programs that it was costing something like 1/3 as much per year as they expected based on similar projects that had gone through conventional development.

After the talk, someone said “Is it safe to say you’re going to use this approach for all of your big software development projects from now on?” Mr Clean said “No, we’re never going to do it again. All of our programmers said they’d quit if we forced them to keep working that way.”

22. […] 04 – Designing a study to see if “the 10x programmer” is a real thing by Andrew Lorin H. writes: One big question in the world of software engineering is: how much […]