I’m a big fan of the Donald Reinertsen approach: measure queue length.
Simply track the time to complete each task in the team queue on average, then multiply that by the number of tasks remaining in the queue.
Each team will habitually slice things into sizes they feel are appropriate. Rather than investing time to try and fail at accurately estimating each one, simply update your average every time a task is complete.
The bonus with this approach is that the sheer number of tasks in the queue will give you a leading indicator, rather than trailing indicators like velocity or cycle time.
Strongly seconding this. For anyone still hesitant, I further recommend the following experiments:
----
Sample a few activities your team has completed. Check how long the 90 % smallest activities are on average, and compare it to the average of the biggest 10 %. Or the median compared to the maximum, or whatever. You'll probably find the difference is about an order of magnitude or less. In the grand scheme of things, every activity is the same size. You can estimate it as exp(mean(log(size)) and be within an order of magnitude almost every time.
Once your team has accepted that something is "an" activity and not a set of activities, don't bother estimating. For all practical intents, size is effectively constant at that point. What matters is flow, not size.
----
For the above sample, also study how long passed between the "go" decision on the task and when it was actually released to customers. In a stable team, this number will be eerily close to the theoretical based on Little's law referenced in the parent comment.
Oh, and you shouldn't focus on man-hours. Work with calendar days. Not only does that simplify mental arithmetic for everyone, it's also the only thing that matters in the end. Your customer couldn't care less that you finished their functionality in "only" 6 man-hours if it took you six weeks to get it through your internal processes.
----
Fun follow-up to the size experiment: now ask someone intimately familiar with your customers to estimate the dollar value of each activity. You might find that while all activities are practically the same size, they'll have very different dollar values. That's what you ought to be estimating.
This. I've found that, whatever the project is, the velocity of a team in terms what they define as a "task" is pretty constant, surprisingly so even. In the end, just counting the outstanding tasks proved to be a good estimator of where we'd end up at the deadline.
This is great for a simple monte carlo simulation!
Choose a finished tasks time randomly, once per remaining task in the queue, and add this up to be a single estimate. Do this 1000 times or so and get an estimated distribution of completion times for the current queue.
This type of thing is covered extensively in Evidence Based Scheduling[0], and is one of the reasons I still think FogBugz ' power is misunderstood.
Nice, I just recently left a company where this approach would have been tremendously useful and fairly easy to build - we had all the required data already, but were just looking at averages of past performance to budgets and using that as a multiplier on the schedule rather than going through a distribution.
That being said, I also despise tracking time!
…Suppose you could move that technique to story points (or whatever unit of measurement) though you would lose a ton of precision.
What do you do when your future tasks are unknown or ambiguous?
For example, at my day job my task is to implement banking. The day to day tasks change... day to day. There aren't a "number of tasks remaining in the queue," since whatever I'm doing is what I'm doing.
One could say this is poor planning. But due to the nature of Big Banks, each task is usually blocking the next one -- in other words, it's not possible to discover or plan what you need to do next, until you've finished currently.
An example of this is when we realized we didn't need to run our banking API using $BigBank's test environment. Their test environment was ... uh ... well, let's just say, when we realized that we could simply switch on "production mode" and bypass their test environment altogether, we collectively facepalmed while rejoicing.
It wouldn't be possible to add "switch to the production environment" into the queue several days ago, because we didn't discover that we could do that until yesterday during our biweekly sync call.
I'm sympathetic to your writeup, and I like your recommended approach. But I just wanted to point out a realistic case of it failing. But in fairness, I think every estimation approach would fail us, so don't feel singled out. :)
Perhaps your approach will work in most cases though, and I'm merely stuck in a twilight zone special case.
> The day to day tasks change... day to day. There aren't a "number of tasks remaining in the queue," since whatever I'm doing is what I'm doing.
What you are describing is not ambiguity, it's total variability. If your future is 100% random, it is, by definition, impossible to predict. Such a state would also mean a total absence of direction/vision. Predicting dates is not only impossible but not a question you can ask, since you don't know what's next.
What I'm going to challenge is that you're effectively in such a case, because I don't think it's true.
> One could say this is poor planning. [...] in other words, it's not possible to discover or plan what you need to do next, until you've finished currently. [...] because we didn't discover that we could do that until yesterday during our biweekly sync call.
The example you're giving *is* poor planning. You're going into execution without validating base assumptions. That you discover the specifics of a dependency that late into the game is that you're going into it without a plan. I'm not judging, in your case maybe no-one is asking for any sort of accountability, and just executing is the best recourse with the lower overhead. But the fact that you can't estimate isn't due to the environment, it's due to the fact that you don't have a plan. Some of the companies I worked with are fine with that, most are not.
It's like this for day-day operations when leadership is absent and no CSI ever gets prioritized over new development. You can say the org is dysfunctional, but there's less leverage workers can use to change such situations. Especially when efficiency measures get rewarded with layoffs.
What happens when you complete your work before you know what you need to do next?
If this never happens, then you have some invisible queue, as you do have things to do next.
As far as your example, that's a great example of a task that seemed like it would take long, and ended up being very very short. Can you describe why this would be bad to add into your task system?
- Add Task: Run banking API in $BigBank test environment
- Start work time clock.
- Find out we don't need to do it, and switch to prod mode
- switch to prod mode
- Close task, and time clock
This is now data for your estimates of future tasks, as this will probably happen randomly from time to time in the future.
Switching to prod mode takes 5 to 7 business days, because we have to order certs from DigiCert and then upload them to $BigBank, whose team requires 5 to 7 business days to activate said certs.
We expected to turn on prod once testing was finished. But we ended up discovering that prod was the only correct test environment, because their test environment is rand() and fork()ed to the point that it doesn't even slightly resemble the prod environment. Hence, "prod am become test, destroyer of estimates."
So for 5 to 7 business days, we'll be building out our APIs by "assuming a spherical cow," i.e. assuming that all the test environment brokenness is actually working correctly (mocking their broken responses with non-broken responses.) Then in 5 to 7 business days, hopefully we'll discover that our spherical-cow representation is actually closer to the physical cow of the real production environment. Or it'll be a spherical cow and I'll be reshaping it into a normal cow.
By the way, if you've never had the pleasure of working with a $BigBank like Scottrade, Thomson Reuters, or $BigBank, let's just say it's ... revealing.
Maybe I'm missing your point. It seems you're attempting to answer the wrong question: is this task accurate, given all the changes that have happened? This is irrelevant for large scale estimation.
The question for scheduling prediction is: what distribution of time will it take to mark any task in this queue as FIXED/INVALID/WONTFIX/OBSOLETE/etc? The queue can have any amount of vagueness you want in it.
Regardless of the embedded work, regardless of whether or not it changes, becomes invalid, doesn't exist, etc - these are all probability weights for any given task/project.
This is interesting. A lot of machine learning works exactly like this [1], in the sense that you're biasing your prediction of each ticket time towards the average, except when there is strong evidence that ticket is special.
(You'd arrive at exactly this method if you found that there were no easy to find characteristics of tickets that could distinguish them as extra-long or extra-short)
Probably this works for a large team of tens or hundreds of developers.
On a team of 4 or 5 people, where people go sick, take vacations, leave the company, new members join... each of these events has a big impact on those metrics. Which again, becomes a lot of work and effort wasted.
But yes, probably this is a viable option on larger teams.
Simply track the time to complete each task in the team queue on average, then multiply that by the number of tasks remaining in the queue.
Each team will habitually slice things into sizes they feel are appropriate. Rather than investing time to try and fail at accurately estimating each one, simply update your average every time a task is complete.
The bonus with this approach is that the sheer number of tasks in the queue will give you a leading indicator, rather than trailing indicators like velocity or cycle time.