What’s wrong with using SPACE to measure developer productivity
Measuring developer productivity lets you identify workflow optimizations to improve your team’s throughput. Knowing what and how to measure is hard, however, because software development success can be gauged using a wide array of metrics. Are you most interested in tickets closed, incidents avoided, staff turnover rates, or an internal standard of your own creation?
SPACE is a framework that attempts to reconcile these factors into one consistent system. It describes a multidimensional solution for collecting and analyzing developer productivity data. This article will explain how you can use SPACE to understand your team’s performance, before exploring some of the problems with the framework.
SPACE was proposed in a 2021 paper authored by researchers at GitHub, the University of Victoria, and Microsoft. It defines developer productivity as the sum of performance in five different measurement categories:
- Satisfaction and well-being – This considers how individual developers feel about their work. Satisfied, healthy, and happy developers with efficient tools and a good work-life balance are likely to be more productive during office hours.
- Performance – Performance looks at whether development efforts are producing the required outcomes for the business. A high-performing team should be creating a quality product that attracts new customers, increases sales, and causes incidents to become less common.
- Activity – Activity encompasses the quantifiable development metrics that are often already familiar to devs and project managers. Many of them are counts of developer actions, such as issues closed, pull requests accepted, and CI/CD pipelines passed vs failed.
- Collaboration and communication – Prompt code reviews, clear explanations of required changes, and easily discoverable documentation all contribute to effective development environments. Without efficient access to information, or the ability to request it, developers can hit frustrating roadblocks that drag down metrics in the other categories. This topic examines weaknesses in the team’s collaboration methods.
- Efficiency and flow – Development work quickly becomes bottlenecked when changes have to repeatedly pass between different people, tools, and synchronous processes. Devs are inherently less productive if they have to wait before getting feedback about a change. The workspace should also be structured so devs have time to deeply focus on tasks, without being interrupted.
Collectively, these topics incorporate both objective metrics, such as pull requests accepted, and subjective ones like employee satisfaction and burnout levels. This produces a holistic view of individual and team performance. It considers the components which affect development output, in addition to the volume and quality of work produced.
Measuring SPACE Performance
To actually utilize SPACE, you need to identify the metrics which matter to your team, then devise accurate measurements across each of the five categories. It’s important you collect data for all the topics so you get a complete picture of your productivity. Here’s how to approach each one.
The Satisfaction and Well-Being category is the most subjective part of SPACE. It’s highly dependent on each developer’s perspective, so you should use surveys, polls, and in-person conversations to canvas different views. Don’t be tempted to make assumptions based on objective data available to you, such as pay grade, number of hours, or the person’s apparent activity, because their internal satisfaction could differ dramatically.
Start by asking devs what they enjoy about their role, then inquire which aspects they find frustrating. This can uncover opportunities to finetune processes and adopt more efficient tools. Questioning about fatigue, burnout, and perceived ability to operate autonomously can also help reveal problems, but remember that not everyone will necessarily reveal their true opinions.
Performance is where you link developer productivity to business outcomes. Some common metrics used for this topic include:
- Number of deployments completed
- Number of bugs reported by customers
- New customers gained
- Increase in sales (e.g. month over month)
- Number of people still using a new feature after its launch
The purpose of this section is to gauge whether work completed is actually fulfilling your business requirements by delivering value to customers. You should augment the list with your own business objectives, then set up data-gathering solutions that allow you to measure the trends in your results.
Activity data can usually be collected from your existing source control and CI/CD systems. Productive developers will regularly interact with the resources held on these platforms. Their actions loosely signal how much engineering progress is being made.
Try plotting changes in the following metrics:
- Issues closed
- Pull requests merged
- Meetings held
- New work specifications approved
- Design files created
- Messages exchanged
These metrics are highly visible and felt by everyone each day. Consequently, teams can become fixated on them as seemingly positive changes are relatively easy to achieve.
You shouldn’t place too much emphasis on activity alone, however, because it’s prone to undue influence from “busy” work that doesn’t contribute to your product’s value. Closing 100 bug reports by merging 150 pull requests might look good, but might not be any more successful than the 10 pull requests that implement one big feature ticket.
Communication and collaboration
This topic is a more subjective aspect that can be measured differently across organizations. It hinges on how your teams are structured, the development models you use (such as agile and lean), and the support that individual developers require to do their best work.
A good place to start is by analyzing how systemized your operations are. Clearly defined procedures for capturing, disseminating, and accessing information allow team members to sustain their performance, even when they’re temporarily stuck. The following data points can be representative:
- Time that changes are stuck waiting for input from others
- How quickly developers can start a new environment
- Number of approvals required to provision infrastructure for testing changes
- Comprehensiveness of documentation, and the number of revisions and clarifications requested by developers
- Time required for another developer to take over an individual’s work, when they go away or leave the organization
All these metrics will suffer if your processes are vague and undocumented. This creates barriers to collaboration which reduce both individual and team performance.
Efficiency and flow
Efficiency is about condensing your processes to the minimum number of steps required to apply a change. Making it simpler to move work from idea through to production optimizes productivity by removing opportunities for developers to get distracted or frustrated. Capturing the following metrics offers visibility into this SPACE dimension:
- The number of reviews required for each change
- How quickly required reviews are completed, and the number of stakeholders involved
- How long developers can focus before being interrupted or having to wait for a tool
- Use of automated systems which augment developer capabilities
- The amount of time “lost” to clarification of specs that could have been resolved before developers started work
Long feedback loops with many friction points make it harder for developers to get and stay focused. This drags down efficiency and can cause work to be wasted, such as when a complex change ends up failing an extended test suite.
Problems with SPACE
SPACE facilitates a comprehensive analysis of the different components of developer productivity. The metrics you collect are still mere data, though: you have to decide which to use, how to analyze them, and the changes to make in response. SPACE can create more confusion than clarity if you don’t use it to shape improvements to your productivity.
1. It doesn’t tell you why
SPACE doesn’t tell you why certain measurements have a particular value, nor can it directly reveal the events that have triggered a change. You might be able to infer a reason by looking at metrics alongside other dimensions, but correlation does not equate to cause.
To fully explore the why behind your metrics, you need to consciously analyze your data and assess it against your broader knowledge of your organization’s operations. This will lead to a more informed understanding of productivity that’s less susceptible to short-term changes in individual measurements.
2. Lack of actionable information
SPACE metrics aren’t immediately actionable. You need to decide how you’ll respond to changes in different dimensions, such as an increasing incident rate or a reduction in developer satisfaction.
It can be helpful to define triggers and thresholds that alert you when to take action. You could redirect teams to fixing bugs and solving performance issues when support ticket volumes grow, for example, or hold a team meeting to talk through problems if collaboration is suffering.
Planning how you’ll respond is vital to ensure your metrics have an effect on your organization. SPACE won’t contribute any value if you never act upon your data.
3. Changes to one metric can affect the others
SPACE’s dimensions are interconnected. This can be a source of confusion and uncertainty. There’s a risk that trying to optimize one area will negatively impact others, such as the tension inherent in trying to increase throughput without causing developer burnout.
The five dimensions must be assessed together to mitigate these challenges. When you need to improve in one area, evaluate the expected impact on the others. If the actual effect of the change falls short of your estimates, you might need to iterate upon your plan. Otherwise, you won’t reach the overall productivity level you intended.
4. Developer pushback
Developers are often wary of performance analysis systems. While good implementations of SPACE can act in their favor by highlighting everyday struggles, devs can still be reluctant to reveal their true opinions. They might be naturally introverted, fearful of any repercussions if they offer negative feedback, or simply unaware of the real difficulties they encounter.
Meaningful use of SPACE is dependent on developer participation. The framework is designed to provide a true picture of what’s happening at both the individual and team level. If individuals aren’t contributing, then the insights you obtain will be unreliable. Gain the trust of developers by explaining why the framework’s being used and how it’ll inform tangible workflow improvements.
5. Unclear prioritization of subjective issues
Individual teams and developers can assign different priorities to subjective issues. One engineer could thrive in the same high-pressure, hard work atmosphere that’s pushing a peer towards burnout, for example. SPACE obscures these details by only presenting the consensus view in your organization.
Balance the relative importance of meeting business objectives and maintaining a healthy work environment before you make changes to your metrics. Agree on priorities with team members to ensure you’re moving in the right direction, without cutting anyone off and leaving them behind.
6. Too many metrics
With five different dimensions and multiple metrics for each, SPACE can quickly produce an overwhelming amount of data. To be helpful, your metrics need to be precise, relevant, and actionable. Otherwise, you’ll waste effort optimizing for aspects that don’t affect your product or developer satisfaction.
Select only the specific measurements you need to gauge your engineering success. It’s important to include some for each dimension though, to ensure you get the balanced picture of developer and business performance that SPACE is designed to provide.
SPACE is a holistic framework for measuring developer productivity. It goes beyond simple counts of commits and pull requests to encapsulate everything that affects development output, from effective collaboration to the level of satisfaction engineers feel in their role.
These characteristics make SPACE a more useful assessment than standalone engineering metrics such as DORA. Nonetheless, SPACE isn’t a perfect representation of developer performance, as no single set of dimensions can capture everything that affects your output. SPACE is inherently subjective and may not reveal the true cause of productivity issues.
SPACE is most useful when you collect a small number of high-value metrics within each dimension. This can help you understand how success in different areas is impacting others, without overwhelming you with unactionable data.
Aviator: Automate your cumbersome merge processes
Aviator automates tedious developer workflows by managing git Pull Requests (PRs) and continuous integration test (CI) runs to help your team avoid broken builds, streamline cumbersome merge processes, manage cross-PR dependencies, and handle flaky tests while maintaining their security compliance.
There are 4 key components to Aviator:
- MergeQueue – an automated queue that manages the merging workflow for your GitHub repository to help protect important branches from broken builds. The Aviator bot uses GitHub Labels to identify Pull Requests (PRs) that are ready to be merged, validates CI checks, processes semantic conflicts, and merges the PRs automatically.
- ChangeSets – workflows to synchronize validating and merging multiple PRs within the same repository or multiple repositories. Useful when your team often sees groups of related PRs that need to be merged together, or otherwise treated as a single broader unit of change.
- FlakyBot – a tool to automatically detect, take action on, and process results from flaky tests in your CI infrastructure.
- Stacked PRs CLI – a command line tool that helps developers manage cross-PR dependencies. This tool also automates syncing and merging of stacked PRs. Useful when your team wants to promote a culture of smaller, incremental PRs instead of large changes, or when your workflows involve keeping multiple, dependent PRs in sync.