Here's a theory for y'all:
"The stuff that's harder to test is the stuff that's likelier to break."
More than once I've written a bunch of tests and skipped some case because it was Special. Of course, Special often means Touchy, and later it'll turn out to be that one case that breaks.
Just got my copy of Iain Simons' *Inside Game Design*. There's an interview with me where I talk about Torpex and Schizoid and say funny things like, "By the time your readers see this, Schizoid will have shipped."
There's also a series of screenshots from the early days of Schizoid, from the days of circles-and-squares programmer art to when we were about halfway done, with lots of enemies and backgrounds that we ended up cutting. Good stuff.
Oh yeah, and there are other studios in there like Valve and Harmonix, but you'll get it for the Torpex spread, right?
So, yeah, hey, credit where credit's due, Clinton Keith's post was what first got me thinking about this - you can even see my comment there in his post, when Schizoid was young but we were already seeing diminishing returns.
So, what do we do about drag and diminished returns?
One reaction is - "Let's get rid of it!" Particularly if we use the tractor-pull metaphor; let's get out of the tractor, clean everything up, and get ready for another pull. That idea doesn't fit so well with the snowball-up-a-hill metaphor, though. You've been working hard to collect that snow, you're not going to get rid of it just so you can push faster.
But still, yes, good idea - let's periodically refactor and pay down our technical debt.
That'll help mitigate the drag somewhat, but it's not going to solve it.
(Another way to mitigate is to bring back waterfall - bigger designs up-front, more rigorous planning up-front. You'll have a less werewolfy curve, because you're A) going slow at first while you do all that planning and B) probably reduce the discovered work. The disadvantages in waste on speculative generality and building the product-you-thought-you-wanted-but-didn't-really-want probably aren't worth it, though, so I'm not recommending this.)
2) Bring Back The Contingency Buffer
Nothing new here. Goldratt in *Critical Chain* spends hundreds of easy-to-read pages explaining why this is a good thing, and he's echoed much more tersely by Tom DeMarco in *Slack.*
Their message: "Don't adjust your estimates by a fudge factor. Instead, use that fudge factor to create a contingency buffer or pad at the end of the project." Because when you set pessimistic goals, people take their time. I'm not a fan of Parkinsonian scheduling (setting unrealistically tight dates) but I'm also not a fan of unrealistically loose dates either. My favorite thing about the contingency buffer is it helps keep scope in check. "Sure, you think you can get it done before our ship date, but can you get it done before we hit alpha?"
Speaking of alpha, game development typically already has a kind of contingency buffer built in, which we call "alpha" or "beta" or "content complete" depending on where you work. The point where you stop "adding features" and just fix bugs or maybe polish. (Of course, the area between bug and feature is a gray one. (The dial is analog, was how I heard Mike McShaffry put it once.))
I have problems with alpha. I have problems with it being defined differently everywhere you go. I have problems with the definition frequently being murky even within a team: ask different guys on your team what alpha means and you might get different answers.
Another problem I have with alpha is it has become too small in this day and age. Back when I did six month projects, we tried to stop adding features a month before we shipped. Now that we're doing three year projects, if we're being proportionate, we'd save six months - but I don't know anyone who does.
But my biggest problem with alpha is it encourages sloppiness - an "add features now, we'll fix the bugs later, in alpha" attitude. Steve Maguire, in *Debugging The Development Process* goes to great lengths to say why that's Bad.
But one thing he doesn't say is, when alpha is treated that way, it is no longer a contingency buffer. For it to be a contingency buffer, your goal has to be to be done by then. Not done-except-for-bugs done. Zero-bugs-done. Done-done-put-it-in-the-box-done.
Schwaber doesn't really do a contingency buffer, either. On the release burndown chart, each estimate is multiplied by a fudge factor. Just what Goldratt says not to do. So what happens? At the beginning of the project, when velocity is high, those fudge factors are canceled out by your high velocity. It looks like you're going to eat through that burndown chart in no time, and Schwaber's method predicts an optimistic release date. Then as velocity decreases the date continually slips back.
So here's my modification to Schwaber. Track your backlog just like Schwaber suggests. (And include all bugs that you intend to fix in your backlog - at Torpex, we use the rule of thumb that 3 bugs equals one story point. YMMV.) But rather than multiplying all the estimates by a fudge factor, add a contingency buffer at the end.
At the beginning of the project, when you have no idea what your progress curve is going to look like, you could use Schwaber's "complexity assessment" to guess how big the contingency buffer should be. I could see it going anywhere from 20% to a well-understood-requirements simple-technology project all the way to 300% for the chaos of poorly-understood-requirements and crazy tech. (Schizoid needed at least a 150% contingency buffer, we know now.)
A problem with the contingency buffer at the end is - if you make your scheduling process visible to your publisher, which most say is a Good Thing - and your publisher sees your large contingency buffer, they're going to ask, "What the hell? Looks like you're billing us for a game much larger than the one you've scheduled!"
So, for the schedule you show your publisher, keep the fudge factors in the original estimates rather than in a large pad at the end. I said this at my talk and Ben Hoyt called me out - "Isn't that dishonest?" he said. At the time I said, "Yes, I suppose," because I tend to cave during Q&A. But in hindsight I don't think it is, really, dishonest - the fudged estimates are your "because this is a project of significant complexity" estimates, and I'd be perfectly willing to share that with a publisher. I'd also be perfectly willing to say that, internally, we have these aggressive goals we're striving to hit, so for internal purposes we don't actually multiply in the fudge factors, but rather leave them as part of a large contingency buffer at the end, but those aggressive goals are not ones we want to be contractually bound to. Trent Oster, in his talk, said similar things - he also said to double your estimates and apparently at Bioware they try to work well ahead of the milestones they've promised their publisher as well.
Now, suppose you're partway into the project, and you're experiencing drag, and you realize even you underestimated your contingency buffer - you're going to sail right past that final deadline with an unfinished game. At that point, a couple things need to happen:
- You have to have that hard conversation with your boss / publisher. The "Please don't sue us for breach, but..." conversation.
- Add a bigger contingency buffer to your next estimate. It's tempting to assume that everything that can go wrong already has and it's going to be smooth sailing for the rest of the way, but then you'll just have to have that hard conversation again with the next slip.
We're trying to approximate this curve:
On Schizoid, at first, our trajectory had us finishing in April, but we told Microsoft June. Call that our 50% contingency buffer. After our first slip, it looked like we could still get it in under the wire. Then we started optimizing and we slipped again. Our contingency buffer is used up. But it was just a blip, right? We'll start making progress again any day now! No. The blip dragged out and became a plateau.
If we had been honest with ourselves at that point, say, by plotting out where our new trajectory was going to hit and adding a larger buffer to that, we would have ended up with a much more realistic estimate about when we'd be done. An incredibly disheartening estimate, but realistic.
And still wrong - here we are, almost February of next year, and still not done yet! But it would have been much less wrong than what we were going with back then.
Sadly, "Less Wrong" is all I can offer.
Next time (notice I didn't say next week - we just did more focus testing and bug triage and we can't bring ourselves to mark as much stuff "Will Not Fix" as maybe we should and now blogging is starting to feel like a luxury): Just how do you calculate "velocity" anyway?
Before I go on to Part 3 and what we can do about drag let me deal with some of the discovered work unearthed by the comments on Part 2.
* Graph Quality
Mark Nau calls me out and says he doesn't see the curves. If we still worked in the same office I probably would ask him to help me do an actual rigorous statistical analysis of the data. The curves are there but they look shallow and I admit I was pretty disappointed after generating the graphs that they weren't more obvious.
As for not seeing much of a bump when we brought the new programmer on, I can explain that. He's only part-time - I'd estimate at him increasing our manpower by 25% or so. Factor in Fristrom's Law and it's less. He needed a ramp-up period. And his first task was huge but greatly underestimated. It could have shown on the graph as a simultaneous increase in scope and work, but instead it shows on the graph as an increase in neither.
But that goes to prove Mark's point - the graph isn't accurate. I'm not able to truly tease apart work and scope, and probably won't even bother on future projects. (Though it was nice to see that we actually have been working all these months, something that isn't obvious from the first graph.)
This graph is simply the delta between the two lines in the other graph - though they don't plot out to the same point in time; I made the graphs at different times. If the lines in the other graph were straight, this graph's lines would be straight as well, but this is not a straight line.
* Other curves?
Jake Simpson and Simon Cooke say they see different shaped curves on their projects - they both see sine waves or something approximating sine waves. It is possible on this project we're just looking at a portion of a big sine wave with a really large period. Which would be a good thing, because that means at some point we'll take off like a rocket and ship! I wouldn't be surprised, actually, if we do get a burst in the end, as we put our foot down about feature creep and the remaining bugs are all highly detailed spot fixes. If only there was some way to predict when that change would come.
Cooke sees the opposite from us, in a way: he sees little visual progress in the beginning, during ramp-up, later followed by a lot of visual progress. (Much like our programmer with his big, underestimated task.) He may be more waterfallish than we are - we tend to get something up-and-running *now* and refactor later. For us, the "up and running now" shows up as a lot of progress, and then the refactoring shows up as drag. (Maybe this agile stuff isn't all it's cracked up to be.)
* Feature Creep vs. Discovered Work?
How much scope increase was stuff we wanted vs. stuff we needed?
FWIW. Red & most of green are needed. Blue, yellow, and pink are wanted. Call it 60-40.
Though I hesitate to call that 40% literal 'feature creep' - most of our unnecessary fixes were not "add X" but "X would be better if you do Y". The feature was already there and we were tweaking it.
More interesting would be to see how this changes over time, but I didn't know until late in the project that you could set up Bugzilla to generate snapshot data that you can later graph. It would be fraught with inaccuracy anyway - at the beginning of the project, p3 meant must-fix and later came to mean nice-to-fix.
I'm guessing Busse's point: if you tighten the reins on feature creep, you can mitigate the effects of drag.
I'll try to get back on track next week, unless there are more interesting comments.
So where does this long tail come from? Why doesn't velocity average out so that after you've been working on your game for a couple of months you've got something you can dead-reckon with?
Some would say it's because of lazy developers. We set a goal - we realize we're going to miss the goal - so we set an easier goal - and then we slack off since we've set that easier goal - and then, before we know it, we're going to miss that goal, too - etcetera. Parkinson's Postulate. Freshman Syndrome. I'm not a big fan of this theory.
Another cause is plain old underestimation. Discovered work, for one thing. "We just forgot we'd have to do that", etcetera. But - shouldn't Discovered Work average out after a while? Maybe not in two months, but after several months shouldn't you have a pretty good idea of the ratio of discovered work to originally planned work? No. The further in the future the work is, the murkier it is, so you're going to tend to see more and more discovered work as you get towards the end of the project. In a way, your post-alpha bug list is *all* discovered work.
Underestimation can show up on the schedule graph as peaks ("We forgot we had to this! Better add it to the schedule!") and plateaus ("You're *still* working on that?"). I can look at the peaks on the schedule graph for Schizoid and say, "That was when we realized we'd have to multithread"; "That was when we started working on our certification requirements and discovered just how hellish they are this time around". [Side note: Oh, Mr. Allard, you promised! You promised it would be easier this time!] [Side side note: for those of you doing 360 development, don't think of that 130 line item TCR list as the real list. Go straight to the test-cases list.] and "That was when we entered QA."
A pernicious kind of underestimation is technical debt - when you think you've finished a feature but there are still lingering issues in it that get caught later. This shows up on the schedule as good velocity now for a creeping increase in bug count later.
Underestimation is most of the story. A lot of people think it's the whole story - that's part of the waterfall "gather *all* requirements and outlaw feature creep" thing. "If only we knew what the client really wanted / hadn't changed our minds / hadn't forgotten x,y, and z, we'd be done on time!"
To get the rest of the story, we need to split the graph into two: a graph of scope increasing and a graph of work done. Schwaber's burndown charts only deal with 'work remaining', and that's what we used for Schizoid, so it took me a while to massage our data to show both:
The green line is scope - as we discover things we forget (we need to multithread; the designers want the macro game to work differently; we should have known we'd have to do *that* for TCR) and as bugs get discovered the green line goes up.
The red line is work done: features completed and bugs fixed.
Visually it looks like the green line is curving one way (the rate of discovering work increases as we go) and the red line is curving the other (the rate of work done is decreasing.)
Something the graph doesn't show is we added a programmer halfway through. You'd think we'd see a bump in the red line there but there isn't much of a sign. So even with the extra manpower we're getting less work done. (You might think this means he's actually slowing us down but trust me, he's not; I don't know where we'd be without him.)
And that bowing of the red line, despite having that extra programmer, is the evidence of drag. Patrick Hughes predicted where I was going with this in the last post - as the system gets larger, your efficiency working on it decreases.
It's like a tractor pull - or perhaps a better metaphor would be pushing a snowball up a hill.
We see this in a variety of ways when working on videogames:
* The components of the system have to play nice together. As you add components, each future component becomes that much harder to create, because it has to work with all the previous components. Boehm's cost of change curve is supposedly for unplanned features - I believe that it applies to all features whether they're planned or not! I'm too lazy to graph our lines-of-code against time but you know what we'd see - the same bowed curve - which doesn't necessarily mean we're doing less work but is a good sign.
* As the game gets bigger turnaround times increase. It takes longer to build, and to run, and to load a level. It takes longer to play through a level to get to that point where the bug is. (You can implement a cheat that lets you zip straight to that point; or write a custom test that lets you execute the code without running the whole project; but doing that takes time, too.)
* QA gets tougher on the larger system. Finding and reproducing bugs becomes more and more difficult as you get fewer needles in a larger haystack.
So that's Drag. Common sense, you're probably thinking. But if it's common sense, why do we schedule as if it doesn't exist?
Why does Mike Cohn, in *Agile Estimation and Planning*, say we can use our velocity for long-term estimation?
Why does Mark Cerny say we can use what we've learned prototyping to accurately estimate how many levels we'll be able to make in production? (Not to dis on Cerny, his Method is awesome and I wish it hadn't been seemingly overshadowed by Scrum.)
Why does Joel Spolsky say you can simply compare estimates to actuals to adjust future estimates? (BTW - *something* must have been wrong with Spolsky's old system, otherwise why would he have felt the need to create a new one? But the new one doesn't solve the problem.)
We're all using linear math to deal with this nonlinear problem.
Credit to Ken Schwaber - he never said his methods could be used to make an accurate long-term estimate...he just cops out and says you can't predict anything more than a couple months away. Not too useful for a studio that might like to, say, sign a contract promising a certain delivery date. We want to be able to hit goals and keep promises!
Coming up: we're basically screwed, but here's some things we can do to mitigate the damage.