Skip to main content

Posts

Showing posts from March, 2026

Why Governance Determines Whether Agentic AI Accelerates or Stalls Engineering 

The incorporation of AI into engineering work — through code completion, test generation, refactoring assistance and documentation support — continues to drive rapid gains in team productivity. As organizations expand their use of AI, they expect the velocity of deliverables to accelerate as well. However, those early gains are offset by increased security reviews, unresolved compliance questions and growing code-review workloads that many don’t account for.   That slowdown points to how AI is being integrated into existing engineering processes, rather than limitations in the tools themselves. Engineers use agentic AI tools to ship faster, but many organizations lack the governance and oversight necessary to effectively manage how those AI tools are being used. Prompts sent through ungoverned agentic AI services lack consistent tracking, auditability and enforcement. This creates uncertainty and risk, leading leadership to worr...

When Customer-Facing Systems Fail: How Incident Response and Observability Reduce MTTR 

People are used to digital services operating immediately, across various places, devices and systems. Should something break down, it is usually obvious to those operating the system. The crucial element is how fast companies can recover, and the key metric for digital stability is called mean time to recovery (MTTR).    See how companies can reduce it to protect revenue, maintain trust and ensure consistent business activity.   Outages are now Customer-Visible Events   Customer interfaces often signal problems before companies know what is wrong. When an e-commerce transaction stops or a video stream pauses, users notice these issues immediately. Looking at companies such as Netflix or Amazon, where service dependability is the key requirement, makes people assess problems in a certain way.   Online feedback, reviews and direct messages make these issues easier to spot. An issue, once narrowed to internal dealings...

Iceberg Won the Format War — Now Comes the Hard Part

Apache Iceberg has effectively won the open table format conversation. AWS, Google Cloud, Microsoft, Snowflake, Databricks — every major platform has thrown its weight behind it. If you work in data engineering or platform operations, the question is no longer whether Iceberg is the right foundation. It’s what it actually takes to run it day to day. That second question doesn’t get nearly enough airtime. And it’s the one that determines whether your Iceberg adoption goes well or becomes a slow-motion infrastructure project that nobody budgeted for. The Gap Nobody Talks About Here’s what Iceberg gives you: a table format with schema evolution, time travel, partition evolution, and engine independence. Here’s what Iceberg does not give you: a way to get data into those tables, a way to model and transform it once it’s there, a way to coordinate when things run, or a way to keep table health in check as data piles up. Put differently, Iceberg defines how tables behave, not how to op...

Lightrun Adds Ability to Dynamically Pull Telemetry Data from Live Apps

Lightrun has added an ability to dynamically pull missing telemetry evidence from live application environments without having to deploy additional instrumentation to its namesake site reliability engineering (SRE) platform that is based on artificial intelligence (AI). Company CEO Ilan Peleg said the Lightrun AI SRE platform includes a sandbox deployed via a software development kit (SDK) that can now be integrated with a live application environment to collect new evidence, test hypotheses, and validate outcomes against real execution behavior without having to deploy additional agents to collect telemetry data. The overall goal is to provide DevOps teams with much-needed additional context on demand to reduce mean time to detection of the root cause of an incident, he added. That capability will soon prove to be crucial as the volume of applications that are being deployed in the age of AI begins to overwhelm the ability of DevOps teams to manage incidents, noted Peleg. At th...

Agentic Systems are Breaking Reliability Frameworks 

Security teams have spent years building  detection and response capabilities  around a failure mode they understood well enough to instrument for. Typically, a service misbehaves, an alert fires and an engineer investigates. This kind of model worked because the systems producing the failures were deterministic enough that misbehavior was visible, measurable and attributable to a cause that a runbook could address. However, what agentic systems have introduced into that environment is a category of failure that looks nothing like the one the detection infrastructure was ...

Tekton Kubernetes-Native CI/CD Project Reaches CNCF Incubation 

The CNCF Technical Oversight Committee (TOC) has voted to accept Tekton as a CNCF incubating project. But what is Tekton? Tekton is a flexible open source framework for creating continuous integration and delivery (CI/CD) systems. It is used to enable developers to build, test, and deploy across multiple cloud providers and on-premises systems by abstracting away the underlying implementation details. No doubt attracted by Tekton’s Kubernetes-native DNA, Tekton is distinguished by its ability to operate entirely inside a Kubernetes cluster. It is capable of treating pipelines (which, in this case, we can define as workflow-based collections of tasks arranged in a graph either in sequential or parallel order) as standard Kubernetes resources. In short, we can say that Tekton serves as a general-purpose, security-minded, Kubernetes-native workflow engine. Where CI/CD tools (such as Jenkins, the widely popularized automation tool) may require a dedicated server, Tekton’s K8S pedigre...

Five Great DevOps Job Opportunities

DevOps.com is now providing a weekly DevOps jobs report through which opportunities for DevOps professionals will be highlighted as part of an effort to better serve our audience. Our goal in these challenging economic times is to make it just that much easier for DevOps professionals to advance their careers . Of course, the pool of available DevOps talent is still relatively constrained, so when one DevOps professional takes on a new role, it tends to create opportunities for others. The five job postings shared this week are selected based on the company looking to hire, the vertical industry segment and naturally, the pay scale being offered. We’re also committed to providing additional insights into the state of the DevOps job market. In the meantime, for your consideration. Indeed.com Information Technology Senior Management Forum McLean, VA Senior Lead Software Engineer – DevOps $209,000 to $262,400 LinkedIn Bellota Labs Redwood City, CA Senior Principal DevOps En...

Sysdig Adds Runtime to Secure AI Coding Agents

Sysdig this week at the RSA Conference (RSAC) revealed it has created a runtime that makes it possible to securely deploy artificial intelligence (AI) coding tools. Jonas Rosland, director of the open source program for Sysdig, said the runtime makes it possible to monitor the activity of AI coding agents in real time, including potential credential risks. It also enables investigation of incidents involving AI agent activity, he added. Additionally, AI agents can be prevented from opening sensitive files or bypassing credential controls. Risky command-line arguments that weaken safeguards, such as allowing unrestricted file writes, are also prevented. Dangerous activity with developer environments, including reverse shells, binary tampering, and persistence mechanisms, can also be prevented. As AI coding tools are made available to both professional and citizen-developers alike, the likelihood of a cybersecurity incident involving these tools continues to rise. DevSecOps teams...

Security as Code is Becoming the New Baseline: Continuous Compliance in DevOps 

There was a time when compliance meant a quarterly ritual. Someone from security would walk over with a spreadsheet, ask a few questions, tick a few boxes and disappear until the next audit cycle. The infrastructure team would scramble to prove that yes, encryption was enabled, and no, that S3 bucket was not public anymore. Everyone felt relieved, went back to shipping features and quietly hoped nothing would drift before the next review.   That model is dead; it just hasn’t been buried yet.   The problem is not that teams lack security awareness. Most engineering organizations today understand that vulnerabilities need catching early and that production environments need hardening. The problem is that compliance has historically lived outside the delivery pipeline — treated as a checkpoint rather than a continuous practice. In a world where teams deploy dozens of...

Embedded DevOps: Bridging the Gap Between Firmware and Modern Delivery 

Embedded software development has traditionally followed a different rhythm than mainstream software engineering.   Hardware availability drives schedules. Validation cycles are longer. Releases are deliberate. Documentation is extensive. For good reason, embedded systems often operate in safety-critical or highly regulated environments.   However, expectations around software delivery have shifted. Connected products, over-the-air updates, security mandates and shorter market windows are creating new pressures for embedded teams.   The result? Many organizations are exploring how DevOps principles can be applied  — thoughtfully — to embedded environments.   Why Embedded Teams are Revisiting Their Delivery Model   Across industries such as automotive, medical devices, aerospace and industrial controls, a consistent pattern is emerging:   Integration happens later than teams would prefer.   Hardware ac...