The Validation Gap Is Costing You More Than You Think

Our latest State of Software Delivery report analyzed more than 28 million CI workflows and found a pattern that should give engineering leaders pause. Average throughput grew 59% year over year. Main branch activity for the median team declined 7%. Teams are generating more code than ever before. Less of it is reaching production.

The cost of poor validation used to show up mostly in developer hours: debugging, blocked deployments, context switching. That cost hasn’t gone away. But there is a second bill now. Every failed build means agent retries. Every slow pipeline is compute burning while an agent waits. Main branch success rates have fallen to a five-year low of 70.8% against a 90% benchmark, and the AI spend attached to every failed cycle is climbing alongside it.

The teams doing well are catching failures earlier and keeping their pipelines healthier. They are running the same tools as everyone else. What they have structured differently is where and when validation happens.

For most of our careers, the inner loop was something that was managed and optimized by the developer. Once AI generated code entered the picture, the inner loop as we knew it could not handle the volume. Today, the inner loop is agentic. It’s where the agent is actively working: writing, iterating, checking before anything is committed or pushed. The outer loop is CI: shared infrastructure, integration, the final gate before shipping. Most teams have invested heavily in the outer loop. Until recently, the inner loop didn’t need much attention. That has changed.

CI was designed around a human pace of development: one engineer, one branch, one push at a time. Agents generate changes in parallel, across multiple tasks, at a volume that makes the push-wait-fix cycle a serious drag on throughput. By the time CI returns feedback, the agent has moved on to the next task. Context is gone. Fixing the failure means starting a new cycle: reloading context, re-examining the change, potentially redoing work that was already completed.

Human review was always the backstop before code reached shared infrastructure. For most teams today, the volume of AI-generated change has simply outpaced what any reviewer can meaningfully assess before it hits CI. The code arrives faster than the review process was designed to handle. Most teams end up making a choice without fully realizing it: either throttle the agents to match available review capacity, or let the volume through and absorb the failures downstream.

Validation needs to happen earlier, while the agent is still working on the change. CI still matters. System-level validation, integration, packaging and deployment belong in the outer loop and always will. But by the time code reaches CI, it should have already passed basic scrutiny. The inner loop is where that confidence gets built, before anything touches shared infrastructure. Getting validation right at this stage is also how teams close the gap between a 70.8% success rate and the 90% benchmark the data points to.

The requirements for inner loop validation are specific to how agents work. Feedback has to arrive within the window the agent is still operating in. Tests need to be scoped to the relevant change, not the full suite. Failures should surface one at a time: an agent given a long list of problems fills its context window and stops being productive. These constraints are different from what CI was designed to satisfy, which is why existing tooling often doesn’t fit.

When the two loops share context, the picture changes. When the inner loop draws on what CI has historically flagged in a codebase, it runs smarter checks before anything is pushed. When CI sees changes that have already been validated locally, it can focus on what actually needs system-level verification. The two stages inform each other and the system gets better at catching the right things over time.

By the time CI returns a failure, is the agent that introduced it still working on that change?

This is where CircleCI has been focusing, building validation that spans both stages and learns from how a codebase actually builds. The infrastructure question is one teams will need to answer if they want to improve their throughput and success rates.

Every model release increases agent velocity and the volume of code flowing into delivery pipelines. Teams that have solved the validation gap absorb that increase and get faster. Teams that haven’t will find the gap between them and the top performers widening with every upgrade.

from DevOps.com https://ift.tt/1jEATmg

Undo Enables AI Agents to Diagnose Root Cause of Application Issues

Undo today revealed that its platform for recording interactions within applications can now be accessed by artificial intelligence (AI) agents via a Model Context Protocol (MCP) server. Company CEO Greg Law said this Undo AI capability makes it simpler for any agent to discover the root cause of any issue that otherwise would have required weeks or months to discover. That capability is now more critical than ever at a time when AI tools are generating massive amounts of code that is overwhelming the ability of humans to actually review, he added. The Undo platform records the complete execution of a program, including every instruction, variable, thread event and system call. That approach captures causality in a way that is deeper than what can be diagnosed solely by relying on log analytics and traces, said Law. An AI agent can then query the recording in the same way they reason about static code to determine exactly how an application functions, he added. Armed with those ins...

News and Tech Update

Search This Blog

The Validation Gap Is Costing You More Than You Think

Labels

Comments

Post a Comment

Popular posts from this blog

Your Wednesday Briefing

Remedy and Transformation

Undo Enables AI Agents to Diagnose Root Cause of Application Issues