HomePodcast

Shipping Code Without Human Verification

Jade Rubick, engineering leadership advisor and former VP of Engineering, talks about why traditional engineering roles are blurring, how automated verification could replace downstream QA, and his 3 prediction for 2027.
Hosted by
Ankit Jain
Co-founder at Aviator
Guest
Jade Rubick
Advisor and Coach

About Jade Rubick

Jade Rubick builds thriving engineering organizations. Jade is a former VP of Engineering at New Relic and Gremlin. He writes a newsletter on engineering leadership and is an advisor and fractional VPE at many startups. Jade's interest is in the intersection between effective and human leadership.

Engineering Teams Are Feeling a Squeeze

I'm seeing a lot of friction around some of our traditional role definitions. Product managers are starting to produce completed and very polished-looking engineering artifacts. This is causing some of the traditional ways that we've divided up these roles to feel kind of arbitrary. It's often putting a lot of stress on engineering teams because the incentives for them are terrible. You seem to be moving slowly if you're making sure all the quality and operational practices and code quality are good. And yet if you ignore those things, you're the one ultimately on the hook for all the failures that will come later.

A lot of engineering teams are feeling a squeeze where the expectations have risen considerably, but the role definition still hasn't really changed.

I'm also seeing some companies where non-engineers are committing code and where that is an explicit goal. I see designers wanting to be more involved in code. The role definition seems to be blurring quite a bit.

Leaders Should Code, But With Humility

There was a longstanding debate in engineering leadership circles about whether engineering managers should code. I'm finding that debate is disappearing. Engineering managers and engineering leaders are much more willing to and feel a need to engage in the code. AI is seen as a route to doing that. That debate has been resolved in favor of yes, engineering managers should code. And that's a pretty big change.

It is appropriate for a director, for example, to be engaging in coding because familiarity with this new paradigm of agent development is essential for our whole field. VPs should be doing that, and really everybody should be. But you have to do that with a certain amount of humility. You're going to have a different experience of using these tools than people that are coding every day. You're probably going to be using a lot simpler projects and will be a lot more successful at it because you're doing simpler things.

There's a bias there that is leading many leaders to have more confidence in these tools than they're able to fulfill sometimes. Which may be appropriate in terms of where things are going, but some of the people working in the corner cases that are really hard to use AI agents for, or where it's harder to set up the context — leaders may develop a feeling for things that is different from a lot of the people they’'re leading.

Harness Engineering vs. Product Engineering

There's likely going to be a set of engineering specialists that are very deep on that type of engineering. The term I've been using is "harness engineering." You're creating this harness, and you're measuring the errors,  the quality, the evals, the security runs, and linters and code duplication scanners. Whenever there's something getting through that is not high quality, you're improving the system to automate that.

But in front of that, there's this whole sort of product engineering and product development. There's a separate skill set of talking to customers and a lot of the traditional product role things and product engineering things. Some of those roles seem to be getting blurred quite a bit, but I can imagine having on a team of people, some engaged in product engineering and some in harness engineering. I would expect to see both centralized and decentralized versions of that.

The Automated Verification Engineer

In the past, I had asked people what they thought about the role of QA in a startup environment. Pretty uniformly, what I got back was that the way QA and engineering worked together typically was not very effective. Most of them said, with some exceptions, that they would not use QA. I'm talking about QA as a downstream process from engineering. That sort of handoff.

The agents often do not produce the best results immediately. It's through an incremental process that they are effective. You can't give them all the heuristics they need to keep in mind and get a perfect result. It's much more like you give them a set of constraints that they bash their heads against until they come up with something that works really well.

I thought it might be a useful experiment to redefine the QA role around automatic verification, redefining the way that you work with engineering to be much more in line with the teams.

The automated verification engineer is much more focused on creating fast feedback for agents and humans. Ideally, I could imagine an automated verification engineer involved much more in the management of the verification pipeline, making sure that the fastest tests execute most directly with the developer as they're working, and things that take a little bit longer are sequenced out. Really, measuring yourself by how quickly you can provide feedback.

They may do more exploratory testing or the hardest things to test. They may have a lot of domain knowledge around how to do that. But ideally the whole team views that as their collective responsibility, with that expert being the person more focused on it than they are.

Shipping Without Human Verification

Engineering teams need to be thinking about how they can get out of verifying every line of code, because otherwise they're going to be completely buried in that and will absolutely be the bottleneck.
I think it's a good goal for an engineering team to ship a certain percent of your commits to production without human verification.

Maybe you start with one percent. What would it take to get some portion of your commits to go out to production without any human verification? It's analogous to CI/CD, where the thinking was: what are the set of practices we need to safely ship something into production as soon as it's approved?

We need a similar set of practices and tools around verifying code. There's going to be some things that always need to be verified, probably. And there will be some things that are likely very safe to ship to production. How do we allow-list those? Those sorts of things will be increasingly important.

Measure Outcomes, not Productivity

Measuring engineering productivity has always been a very fraught topic. My favorite way to measure productivity is to look at outcomes. What I like to look at is: how much value have we delivered over time? If our use of AI tooling is accelerating that we're delivering a lot more value over time, then it's working. And if not, then we may be pushing more PRs or whatever, but what really counts is how much value we're delivering.

The way I typically like to measure that is: everything we ship, we score by customer impact, and you add it up. What's nice about that is you can do it retroactively. You can look at your last year of delivery and then go forward.If you're delivering more value, then that's what you want. And if you're not, then all of your perceived productivity gains are maybe not actually doing as much as you think.

Predictions for 2027 I’m willing to be wrong about

  • The cost of tokens is going up 10 times
    The cost for tokens for producing the same output is going to go up about ten times in the next 18 months. The reason I said this is that I'm willing to be wrong about it. There is a lot of competition in this space, so there's reason to think the competition and the amount of money invested will keep this going for a lot longer. But what is happening right now seems pretty unsustainable.
  • Written culture increases AI productivity gains
    Companies that have written cultures will see disproportionate gains over those that don't. The more a company is using written artifacts, the easier they will be able to set up a lot of business processes to be automated.
  • Large code bases + AI-generated code = problems
    I read a wonderful blog post by Jim Shore, and he was talking about some of the second-order effects of coding agents. His basic thesis was that if you're getting twice as productive, your maintenance costs have to be at least halved. I think we're going to see many organizations that are not keeping up with that.

    Historically in DevEx, what you're fighting is the complexity of a company. You're trying to bend the curve against the rising complexity of the company and keep your productivity from getting worse over time. I think we'll see this curve start with AI development in a way that we haven't in the past.


    We'll start hitting large code bases with tons of AI-generated code. And if you're not investing a certain amount of that productivity gain back in higher quality, you'll start having big problems.

Ready to transform your development workflow?

Transform scattered processes into reliable, collaborative Runbooks.

Join us at The Hangar

A vetted community for developer-experience (DX) enthusiasts.
Learn More