Skip to main content

The Validation Gap Is Costing You More Than You Think

Our latest State of Software Delivery report analyzed more than 28 million CI workflows and found a pattern that should give engineering leaders pause. Average throughput grew 59% year over year. Main branch activity for the median team declined 7%. Teams are generating more code than ever before. Less of it is reaching production.

The cost of poor validation used to show up mostly in developer hours: debugging, blocked deployments, context switching. That cost hasn’t gone away. But there is a second bill now. Every failed build means agent retries. Every slow pipeline is compute burning while an agent waits. Main branch success rates have fallen to a five-year low of 70.8% against a 90% benchmark, and the AI spend attached to every failed cycle is climbing alongside it.

The teams doing well are catching failures earlier and keeping their pipelines healthier. They are running the same tools as everyone else. What they have structured differently is where and when validation happens.

For most of our careers, the inner loop was something that was managed and optimized by the developer. Once AI generated code entered the picture, the inner loop as we knew it could not handle the volume. Today, the inner loop is agentic. It’s where the agent is actively working: writing, iterating, checking before anything is committed or pushed. The outer loop is CI: shared infrastructure, integration, the final gate before shipping. Most teams have invested heavily in the outer loop. Until recently, the inner loop didn’t need much attention. That has changed.

CI was designed around a human pace of development: one engineer, one branch, one push at a time. Agents generate changes in parallel, across multiple tasks, at a volume that makes the push-wait-fix cycle a serious drag on throughput. By the time CI returns feedback, the agent has moved on to the next task. Context is gone. Fixing the failure means starting a new cycle: reloading context, re-examining the change, potentially redoing work that was already completed.

Human review was always the backstop before code reached shared infrastructure. For most teams today, the volume of AI-generated change has simply outpaced what any reviewer can meaningfully assess before it hits CI. The code arrives faster than the review process was designed to handle. Most teams end up making a choice without fully realizing it: either throttle the agents to match available review capacity, or let the volume through and absorb the failures downstream.

Validation needs to happen earlier, while the agent is still working on the change. CI still matters. System-level validation, integration, packaging and deployment belong in the outer loop and always will. But by the time code reaches CI, it should have already passed basic scrutiny. The inner loop is where that confidence gets built, before anything touches shared infrastructure. Getting validation right at this stage is also how teams close the gap between a 70.8% success rate and the 90% benchmark the data points to.

The requirements for inner loop validation are specific to how agents work. Feedback has to arrive within the window the agent is still operating in. Tests need to be scoped to the relevant change, not the full suite. Failures should surface one at a time: an agent given a long list of problems fills its context window and stops being productive. These constraints are different from what CI was designed to satisfy, which is why existing tooling often doesn’t fit.

When the two loops share context, the picture changes. When the inner loop draws on what CI has historically flagged in a codebase, it runs smarter checks before anything is pushed. When CI sees changes that have already been validated locally, it can focus on what actually needs system-level verification. The two stages inform each other and the system gets better at catching the right things over time.

By the time CI returns a failure, is the agent that introduced it still working on that change?

This is where CircleCI has been focusing, building validation that spans both stages and learns from how a codebase actually builds. The infrastructure question is one teams will need to answer if they want to improve their throughput and success rates.

Every model release increases agent velocity and the volume of code flowing into delivery pipelines. Teams that have solved the validation gap absorb that increase and get faster. Teams that haven’t will find the gap between them and the top performers widening with every upgrade.



from DevOps.com https://ift.tt/1jEATmg

Comments

Popular posts from this blog

Cursor’s New SDK Turns AI Coding Agents Into Deployable Infrastructure

For most of its life, Cursor has been an IDE. A very good one. But with the public beta of the Cursor SDK, the company is making a different kind of move — one that should get the attention of DevOps teams. The Cursor SDK is a TypeScript library that gives engineers programmatic access to the same runtime, models, and agent harness that power Cursor’s desktop app, CLI, and web interface. In short, the agents that used to live inside an editor can now be invoked from anywhere in your stack. That’s a meaningful shift in how AI coding tools fit into software delivery pipelines. From the Editor to the Pipeline If you’ve used Cursor before, the workflow is familiar — you interact with an agent in real time, asking it to write functions, fix bugs, or review code. The SDK breaks that dependency on interactive use. Now you can call those same agents programmatically, from a CI/CD trigger, a backend service, or embedded inside another tool. Getting started is a single inst...

Mistral Moves Coding Agents to the Cloud — and Gets Out of Your Way

For the past year or so, AI coding agents have been tethered to your local machine. You kick off a task, watch the terminal, and babysit every step. It works — but it’s not exactly hands-free. Mistral just changed that. On April 29, the Paris-based AI company announced remote coding agents for its Vibe platform, powered by a new model called Mistral Medium 3.5. The idea is simple: Instead of running coding sessions on your laptop, they now run in the cloud — asynchronously, in parallel, and without you watching over them. What’s Actually New Coding sessions can now work through long tasks while you’re away. Many can run in parallel, and you no longer become the bottleneck at every step the agent takes. That’s the core pitch. You start a task from the Mistral Vibe CLI or directly from Le Chat — Mistral’s AI assistant — and the agent handles the rest. When it’s done, it opens a pull request on GitHub and notifies you, so you review the result inste...

GitHub Resets Copilot Pricing as AI Compute Costs Surge

The development community saw this one coming: GitHub will transition its Copilot service to a usage-based billing model on June 1, replacing its existing system of fixed subscriptions supplemented by premium request limits. As reported last week, GitHub suspended new sign-ups for several of its Copilot subscription tiers as it faced a surge in demand from agentic coding workflows. To address that, under GitHub’s new pricing model, customers across individual, business, and enterprise tiers will receive a monthly allocation of AI credits, which are consumed based on token usage. This includes input, output, and cached data processed by underlying models. Once those credits are exhausted, users can purchase additional capacity at published rates. The change leaves base subscription prices intact. Individual plans remain priced at $10 per month for Pro and $39 for Pro+, while business and enterprise tiers continue at $19 and $39 per user per month, respectively. Each plan’s monthly ...