Metrics for Agile Teams

Chris Szymansky
7 min readMay 23, 2016

Is your team improving? How can you quantify the efforts of team and process improvements that have been put in place over time?

Some will be quick to say that shipping working software is really the only metric that matters. Others will be concerned that metrics have the potential to burden teams with red tape and minute details that don’t really matter.

With respect to both positions, there are ways to sensibly capture the activities that go into shipping software, specifically around cost, quality, and time. If a project was a huge success, we should be able to understand why it was successful. Likewise, if a project struggled, there should be an opportunity to learn from it. Metrics help measure both cases over time.

Velocity

Work items in a sprint (a timeboxed period) are measured in units called story points. A team’s velocity is the number of story points completed per sprint, averaged over a number of sprints.

Velocity is a key metric that enables product owners to understand how quickly a team can work through the product backlog. For example, if the team has a velocity of 100 points, works on a two week sprint cadence, and the product backlog for a given feature is 200 points, it can be expected that the feature can be completed in two sprints (one calendar month). An accurate velocity is essential for forecasting and long range planning.

“What is a point?”

Anyone who has worked with points has probably clumsily answered the question “What is a point?”.

I like to compare one point to one US dollar (or any other unit of currency). A dollar is really just a piece of paper, but it’s also a unit that contributes to the purchase of some type of product. Over time, the value of a dollar has been established based on what it can be exchanged for (the price of a good). People who have dollars believe that they will be worth roughly the same tomorrow as they are today.

A point is a really just a measuring stick. It’s a unit that represents the effort required to complete a story. Over time, teams become more confident with estimating and defining the number of units of effort that go into the completion of a story. This level of confidence means that points become worth roughly the same tomorrow as they are today and they can then be used as an accurate forecasting tool.

Uneven velocities

Expect to see your team’s velocity level out over time as they gel as a team and get more skilled with estimation. Newer teams or teams with less mature processes will typically see velocity increase over time until it hits a fairly consistent number.

Even in very mature teams, there will always be ebbs and flows in velocity on a sprint-to-sprint basis. There will be good sprints where everything went right, bad sprints where everything went wrong, sprints that are interrupted by holidays and vacations, and everything in between. This is why velocity is calculated using a rolling average, rather than just taking the velocity of the most recent sprint.

Huge swings in velocity are often anti-patterns that should be further examined, as it can be an indicator that stories are too large or teams are overcommitting and carrying stories over to the following sprint where they are then completed.

Cycle time

Cycle time is the amount of actual time in work hours that it takes to complete a story. Cycle time starts when the team begins work on a story and ends when the story is completed.

For example, assume that we have two sprints with the following stories

Sprint 1
Story 1 – Completed in 16 hours
Story 2 – Completed in 13 hours
Story 3 – Completed in 19 hours
Story 4 – Completed in 2 hours

Sprint 1 average cycle time: 12.5 hours

Sprint 2
Story 1 - Completed in 4 hours
Story 2 - Completed in 7 hours
Story 3 - Completed in 8 hours
Story 4 - Completed in 26 hours

Sprint 2 average cycle time: 11.3 hours

Although cycle time is an average across all stories, we can infer that a higher cycle time indicates a larger story. The calculation can be sliced by points to confirm this. It can be helpful to know, for example, that x point stories tend to take y amount of time.

Ideally, cycle time improves until it hits a point where it is steady and predictable.

If cycle time trends upwards, it means that the team is less productive on a per-story basis. It may be necessary to look at:

  1. How efficiently the team is working through stories. Are there other things going on that are inhibiting productivity? These could be personal, environmental, or technical. For example, when a very complex sprint is beginning, expect to see higher cycle times of stories. As the sprint progresses, the cycle time should improve as the work is better understood.
  2. How well written and how well decomposed the stories are. Sometimes the product backlog is not well groomed. In other words, there are stories that the team is less confident in or haven’t spend adequate time understanding. Other times, stories turn out to be too broad or too vague. Slowing cycle times could require looking at how the backlog and stories are groomed.

Both of these items are perfect for teams to tackle in their retrospective. A benefit of self-organizing teams is that they often know what the problem is and are able to correct it.

Cost per point

Quantifying the cost of a point is crucial for establishing hurdle rates on projects. Making this cost visible to the business can help guard against scope creep. Seeing that “bell and whistle x” actually costs $8,000 can be a gut check and help weed out non-essential items that have limited ROI.

Cost metrics can be broken down as follows:

Annual cost of team. Total (or average) salary of team members times the number of team members. For team members who are not full time or are in management functions where they are not dedicated to the team, apply a proportional multiplier to them at your discretion based on the amount of time they focus on the team.

Cost per sprint. Annual cost of team divided by number of sprints per year.

Cost per point. Cost per sprint divided by velocity (average points per sprint).

Cost per feature. For a given feature, add up the total number of points in the feature.

Defect Criticality Index

Everyone wants the trifecta of “fast, cheap, and good”. We’ve looked at efficiency and cost above, so the final metric hinges on quality. Even if the axiom holds that we can only pick two of these, we can at least measure improvement across all of them.

Defect Removal Efficiency (DRE) is a old software testing metric for gauging the percentage of defects (bugs) the team found during development vs. the amount that made their way in to the production release. I’ve found this to be difficult to compute and also somewhat limiting as it doesn’t account for severity of bugs. Furthermore, it’s very discretionary, as some teams might find and report more bugs than others during development, whereas other teams just fix them as part of their definition of done for a story (and if someone wanted to, they could game the metric by creating and fixing more bugs during development to improve the DRE).

So rather than use DRE, I’ve found another metric to be more useful. I call this Defect Criticality Index (DCI).

DCI focuses only on bugs that make their way into the production application. Each bug that is found in production is assigned a weight on a criticality scale:

0 - Trivial - Purely cosmetic, like a grammatical error or typo
1 - Minor - Minor visual issue or inhibited functionality, but workarounds exist
2 - Major - Inhibits functionality and no real workaround exists, but the functionality is not core to the product
3 - Critical - Inhibits core functionality
4 - Blocker - Prevents essential functionality from working, like an entire core feature or set of core features from working

The initial determination of criticality depends on your organization’s support structure, but can most often be set by the level 1 support team.

With a criticality scale in place, DCI is a simple calculation. Let’s say that Feature X went live in version 4.2.0. Due to the release of 4.2.0, a number of bugs were introduced. They were classified as:

Bug 1 - 2 — Major
Bug 2 - 2 — Major
Bug 3 - 1— Minor
Bug 4 - 0 — Trivial

The DCI is the sum of the criticality of the bugs (2 + 2 + 1 + 0) for a DCI of 5.

DCI allows features to be compared head-to-head and helps teams see how they are doing with testing and quality assurance.

Over time, we can see the average DCI for features of various sizes. For example, we might see that when the team works on a 200 point feature, the DCI is typically 5. If we then see a 100 point feature with a DCI of 10, the team should seek to understand why this occurred in their retrospective.

It is also helpful to see if DCI correlates to the type of feature that is being worked on. For example, we might see that one team tends to have a DCI of 8 when working on UI-heavy features, but another team working on similar sized UI-heavy features averaged a DCI of 4. This could indicate that one team is missing a skill set.

Summary

This post covered capacity, cost, and quality metrics for agile teams.

These metrics tend to be useful both within teams and across teams. Individual teams can see how they are improving over time and use them to focus on continuous improvement in their retrospectives. Looking across teams in larger organizations allows managers to compare teams to spot inefficiencies.

Collectively, agile metrics demystify the product development process, highlight business value, and measure delivery of quality software in a timely manner.

--

--

Chris Szymansky

CTO at Fieldguide (https://fieldguide.io). Prev. engineering and product at Atrium and JazzHR.