AI Engineering is Just Systems Engineering

About Wayne Duso

Wayne Duso is Vice President of Platform and Infrastructure at 1Password. He is an engineering and product executive with a track record of building and operating enterprise and cloud services from inception to multi-billion-dollar outcomes. Prior to his current role, Wayne spent nearly 12 years at Amazon Web Services as General Manager and Vice President of Engineering and Product.

‍

1Password started its AI journey in earnest in Q4 of last year, relatively new compared to many, but the progress in two-plus quarters has been remarkable. Today, 85–86% of developers are using the toolset week over week, and 85% of code commits are either assisted, augmented, or native through AI tooling. The bottleneck has shifted from writing code to planning and verification, and the company has built adversarial agent workflows to handle much of the review burden.

Starting the Journey the Right Way

We didn't approach this from the angle of how do we use AI tools to produce magic. We asked: how do we think about engineering discipline, how have the tools evolved our discipline, and how do we want to apply that? We're a company responsible for the privacy and security of millions of customers. So we can't just think about going faster or using the latest technology. How do we leverage where engineering is headed while not giving up on any of our core ethos and principles?

‍What I usually say to people is: if you think you have this all figured out, you should be a lot more paranoid. And if you think you're behind everybody, you should be a lot less paranoid.

The 20-60-20 Split

You have the typical 20-60-20 split with humans. The leading 20% love new tools, love new shiny objects, so away they go. The middle 60% are very focused on their work, and when you introduce change, they ask, I'm busy right now; do I really need to do this? And then you have the 20% on the other side who say, I've seen change happen before; it doesn't go well.

The conversation became: the front 20% will adopt these tools; how do we learn from them? How do we take what they're doing and have it adopted by the middle 60%, and even start bringing along the trailing 20%, so they understand this is not about simply AI, it's about the evolution of our engineering discipline.

The Bottleneck Has Shifted to Planning

Think about how engineering is generally done. You have a few engineers who are amazing at the system design and architecture piece — they start on the left-hand side of the whiteboard and work all the way to the right. There are not a lot of those folks. The majority of engineers start in the middle of the whiteboard and start drawing. Eventually you ask questions, and sometimes they move right, sometimes left.

What happens when you're working at AI speed or agentic speed? If you just jump into the middle of a problem with what essentially are a set of agents that are really enthusiastic about meeting your needs, you probably are not going to end up on either the left-hand or right-hand side where you expected.

The Monolith Refactor Experiment

We ran a six-week proof of value on how we can use AI tools to perform refactoring. We quickly discovered it's not going to work well by simply giving simple instructions. So we stepped back and asked: what would the most system-thinking engineers do? How would they break this problem into its constituent pieces and assign those to various individuals — in this case, various agents?

One of our engineers tried the easy experiment first and realized the models quickly fell apart within the context they could understand. So they stepped back and said, What we need is a deterministic understanding of our code base, not a probabilistic.

They used the tooling to build tooling — an abstract syntax tool to go through our entire code base in a very deterministic way. They were able to generate that tool in a matter of hours. The result was an amazing spec of our entire system: all of our current services, all of our over 900 endpoints, and their relationships to our data sources.

Within about seven days, we had an incredibly well-written spec and Epic. It ended up creating, along with the principal engineer, a plan for 17 agents that would be required to refactor one component of our monolith. We did all of that work upfront — and when it was executed, it took four hours. It was so fascinating that we had to write a case study about it.

Human in the loop from a planning, architecture, and design standpoint. Human in the loop for verification of the actual PRs before they were merged into production. And then the mechanics of the work, the stuff that was so important for most of our careers, was automated.

This Is Just Systems Engineering

This sounds like systems engineering, doesn't it? It's no different from what you would have done 10 years ago as an engineering manager or principal engineer by assembling five or ten engineers into a room, breaking down the problem, deciding what the architecture should look like, and how it will meet the end requirements. It is the same process, but now it can be done by a single engineer.

‍I couldn't be more excited about this era. There are many narratives out there around not needing engineers anymore, just having agents automate everything away. The reality is that the discipline of software engineering and systems engineering is actually coming to the fore because it's necessary to build these specifications so the agents can do what we need them to do without stepping too far left or right.

What we produce for agents and what we produce in these specifications is good for humans and agents alike. We're building a collaborative exercise between the human and the agents by making sure things are well-specced.

Managing Specifications

The specifications — the markdown files that capture the what, the why, the how, and the so what — are all captured in Jira. For any story, any Epic, any initiative, those are now well-documented. As an engineering leader, that is my primary source of understanding, my primary source of auditing and learning, and my primary source of staying close to the work. It doesn't just keep you close to the technical work. It keeps you close to how the work is being done. It's very holistic.

Once you carry that ticket into the check-in, you now have a tether, a connection back to the original intent. It would be nice if that was brought into the change management system and into source control. It will happen.

Code Review Under Pressure

PR review became more daunting. We have a dedicated tiger team of developers and leaders who look at these problems and say, We know the bulk of PRs are coming too fast; how do we improve this process?

If we can't have quality reviews and get changes into production in a quality fashion, we haven't really helped ourselves at all.

We had to start looking at the size of PRs and at how many people needed to approve each one. As a security company, we're very specific around how we approve every PR before it goes into production — from a security standpoint, a resiliency standpoint, and a functional standpoint. So we had to invent ways of queuing up these PRs and thinning down the set of folks who need to review, not in terms of quality, but in terms of focus. The people doing each code review are very aware of the code area and can quickly assess what the reviewing agents are telling them.

We actually have an adversarial set of agents on our code reviews. It's not just one agent producing the results, it's a set of agents in a specific workflow from simple linting all the way through to very adversarial thoughts on whether this is the right way of implementing a particular change. When a human comes in for the final review, it becomes a much faster process, in the order of an hour versus a day.

Measuring AI Impact

Every quarter we revisit our metrics to figure out if we're measuring the right stuff. We've evolved our metrics three times in two and a half quarters, and every single time we look at a metric, it's not interesting anymore.

Adoption percentage — we're now at 85–86%, and that metric is not moving fast now anymore and doesn't tell us much. The percentage of code contributed by AI was interesting for a while; now it's starting to become less interesting. But we've gotten back to where we started: what are we really trying to accomplish? And that is increasing the velocity of business and customer results with the same level or better quality.

Right now we're looking at three basic measures: from first check-in to posting a PR; from posting a PR to merge and deployment; and once deployed, of the PRs that had a contribution by agentic workflows, which deployments are flawed, and can we trace that back to humans or to agentic code generation?

What we measure is basically:

How quickly do we get from first check-in to PR,
From PR to merge,
And how many of those deployments are solid

Jira Ticket to PR, Fully Automated

We have a tiger team with one work stream focused on going from Jira ticket to PR, fully automated. We've been doing this for two and a half quarters — that's pretty aggressive. But if you don't try, you won't know. And in fact, the success of that process will be uncovering what we don't know. I consider failure to be the beginning of success, not the end.