AI Is Changing How We Write Infrastructure, But It’s Not Solving How We Control It

Over the past year, AI has fundamentally changed how software is written. Infrastructure code is no exception. Tasks that once required deep familiarity with tools, syntax, and workflows can now be handled through natural language. Engineers are no longer starting from a blank file. In many cases, reviewing and modifying code generated for them has become the norm.

At a high level, this looks like progress, and in many ways, it is. Teams can move faster, the barrier to entry is lower, and experimentation is easier. But there is a growing gap that many organizations are only beginning to recognize: AI is accelerating how infrastructure is created, but it is not solving how infrastructure is understood, controlled, or governed.

The Shift from Writing to Generating

Traditionally, infrastructure as code assumed that the person writing the configuration understood what it would do. That assumption no longer holds.

Today, it is increasingly common for infrastructure definitions to be generated rather than authored. The learning curve for tools and languages has effectively collapsed. This opens the door for more engineers and even non-specialists to interact directly with infrastructure. That shift is powerful; however, it also introduces a new kind of risk.

It is possible to generate a correct-looking configuration without fully understanding its impact. In application development, this is often manageable. A faulty deployment can usually be rolled back with limited consequences. Infrastructure is different. A misconfigured resource, an unintended dependency, or an incorrect policy can have immediate and widespread impact. In some cases, it can affect the availability, security, or integrity of entire systems. Stated differently: you can restart the database, but probably not the data itself? Different problems.

The problem is not that AI makes mistakes. The problem is that it enables changes to be made faster than organizations can safely reason about them.

Let’s explore this.

The Growing Tension: Speed vs. Control

Platform teams are now operating under two opposing pressures. On one side, there is a clear expectation to move faster. AI has raised the baseline for productivity across engineering, and infrastructure teams are expected to keep pace.

On the other side, the requirements for infrastructure have not changed. Systems still need to be secure, compliant, and stable. In many industries, those requirements are becoming more stringent, not less.

This creates a tension that cannot be resolved by simply adding more automation. Moving faster without control increases risk. Enforcing control without adapting workflows slows teams down. Most organizations today are caught somewhere in between.

Why Existing Models Don’t Hold

Before AI became part of everyday workflows, infrastructure teams typically operated in one of two modes.

The first was informal and manual. Engineers would make changes directly in cloud consoles or through ad hoc scripts. This allowed for speed, but at the cost of visibility, consistency, and governance.

The second was highly structured. Infrastructure changes would go through a defined pipeline: code written, reviewed, validated, and deployed through policy checks. This improved control, but introduced friction.

Both models are still present today. Neither fits the current reality. AI makes it easier to bypass structured workflows, because generating a change is no longer the hard part. At the same time, the cost of mistakes remains high. What used to be a tradeoff between speed and control is now a mismatch between how quickly changes can be made and how slowly they can be validated.

The Real Gap: Understanding and Decision-Making

One of the less discussed effects of AI-assisted workflows is the loss of context. When engineers write infrastructure manually, they build an implicit understanding of dependencies, constraints, and failure modes. That understanding is part of the safety model.

When infrastructure is generated, that context is no longer guaranteed. You can ask for a system to be created in precise terms. The system may be created correctly. But if you cannot fully interpret the result, you are operating with partial understanding. In infrastructure, partial understanding is often not enough.

This is where many current approaches fall short. They focus on improving generation—better prompts, better models, faster execution. But generation is not the limiting factor anymore. The limiting factor is decision-making.

Guardrails, Not Just Automation

If AI is going to play a meaningful role in infrastructure, it needs to operate within clearly defined boundaries. These boundaries cannot rely on probabilistic systems alone. They need to be deterministic, enforceable, and transparent. In practice, this means shifting focus from how infrastructure is generated to how it is governed.

Policies, access controls, and approval workflows are not new concepts. What is changing is their role. They are no longer just safeguards around human actions. They are becoming the primary mechanism for controlling both human and AI-driven changes.

This is not fundamentally different from how organizations have managed risk in the past. Humans are also non-deterministic. We have always relied on guardrails to ensure consistency and safety. The same principle applies here.

Building Trust Incrementally

One of the more pragmatic approaches emerging in organizations today is incremental adoption. Rather than moving directly to full automation, teams start with visibility. They use AI to understand infrastructure state, identify issues, and suggest changes. Over time, as confidence grows, they allow more direct interaction with systems. This progression is important.

Trust in infrastructure systems is not built through capability alone. It is built through predictability. Teams need to understand not just what a system can do, but how it behaves under constraints. AI can accelerate workflows. It cannot replace the need for that trust.

A Shift in Operating Model

The broader implication is that infrastructure is entering a new operating model. More people will interact with it. Changes will happen faster. The distinction between writing and operating infrastructure will continue to blur.

In this environment, the differentiator will not be how quickly organizations can generate infrastructure. It will be how effectively they can control it.

AI is already changing how infrastructure is written. That part is inevitable. The harder and more important question is how we ensure it remains safe, predictable, and aligned with the needs of the business.

That is not a problem AI will solve on its own.

from DevOps.com https://ift.tt/ZPOhrxI

Undo Enables AI Agents to Diagnose Root Cause of Application Issues

Undo today revealed that its platform for recording interactions within applications can now be accessed by artificial intelligence (AI) agents via a Model Context Protocol (MCP) server. Company CEO Greg Law said this Undo AI capability makes it simpler for any agent to discover the root cause of any issue that otherwise would have required weeks or months to discover. That capability is now more critical than ever at a time when AI tools are generating massive amounts of code that is overwhelming the ability of humans to actually review, he added. The Undo platform records the complete execution of a program, including every instruction, variable, thread event and system call. That approach captures causality in a way that is deeper than what can be diagnosed solely by relying on log analytics and traces, said Law. An AI agent can then query the recording in the same way they reason about static code to determine exactly how an application functions, he added. Armed with those ins...

News and Tech Update

Search This Blog