Skip to main content

Posts

Why Governance Determines Whether Agentic AI Accelerates or Stalls Engineering 

The incorporation of AI into engineering work — through code completion, test generation, refactoring assistance and documentation support — continues to drive rapid gains in team productivity. As organizations expand their use of AI, they expect the velocity of deliverables to accelerate as well. However, those early gains are offset by increased security reviews, unresolved compliance questions and growing code-review workloads that many don’t account for.   That slowdown points to how AI is being integrated into existing engineering processes, rather than limitations in the tools themselves. Engineers use agentic AI tools to ship faster, but many organizations lack the governance and oversight necessary to effectively manage how those AI tools are being used. Prompts sent through ungoverned agentic AI services lack consistent tracking, auditability and enforcement. This creates uncertainty and risk, leading leadership to worr...
Recent posts

When Customer-Facing Systems Fail: How Incident Response and Observability Reduce MTTR 

People are used to digital services operating immediately, across various places, devices and systems. Should something break down, it is usually obvious to those operating the system. The crucial element is how fast companies can recover, and the key metric for digital stability is called mean time to recovery (MTTR).    See how companies can reduce it to protect revenue, maintain trust and ensure consistent business activity.   Outages are now Customer-Visible Events   Customer interfaces often signal problems before companies know what is wrong. When an e-commerce transaction stops or a video stream pauses, users notice these issues immediately. Looking at companies such as Netflix or Amazon, where service dependability is the key requirement, makes people assess problems in a certain way.   Online feedback, reviews and direct messages make these issues easier to spot. An issue, once narrowed to internal dealings...

Iceberg Won the Format War — Now Comes the Hard Part

Apache Iceberg has effectively won the open table format conversation. AWS, Google Cloud, Microsoft, Snowflake, Databricks — every major platform has thrown its weight behind it. If you work in data engineering or platform operations, the question is no longer whether Iceberg is the right foundation. It’s what it actually takes to run it day to day. That second question doesn’t get nearly enough airtime. And it’s the one that determines whether your Iceberg adoption goes well or becomes a slow-motion infrastructure project that nobody budgeted for. The Gap Nobody Talks About Here’s what Iceberg gives you: a table format with schema evolution, time travel, partition evolution, and engine independence. Here’s what Iceberg does not give you: a way to get data into those tables, a way to model and transform it once it’s there, a way to coordinate when things run, or a way to keep table health in check as data piles up. Put differently, Iceberg defines how tables behave, not how to op...

Lightrun Adds Ability to Dynamically Pull Telemetry Data from Live Apps

Lightrun has added an ability to dynamically pull missing telemetry evidence from live application environments without having to deploy additional instrumentation to its namesake site reliability engineering (SRE) platform that is based on artificial intelligence (AI). Company CEO Ilan Peleg said the Lightrun AI SRE platform includes a sandbox deployed via a software development kit (SDK) that can now be integrated with a live application environment to collect new evidence, test hypotheses, and validate outcomes against real execution behavior without having to deploy additional agents to collect telemetry data. The overall goal is to provide DevOps teams with much-needed additional context on demand to reduce mean time to detection of the root cause of an incident, he added. That capability will soon prove to be crucial as the volume of applications that are being deployed in the age of AI begins to overwhelm the ability of DevOps teams to manage incidents, noted Peleg. At th...

Agentic Systems are Breaking Reliability Frameworks 

Security teams have spent years building  detection and response capabilities  around a failure mode they understood well enough to instrument for. Typically, a service misbehaves, an alert fires and an engineer investigates. This kind of model worked because the systems producing the failures were deterministic enough that misbehavior was visible, measurable and attributable to a cause that a runbook could address. However, what agentic systems have introduced into that environment is a category of failure that looks nothing like the one the detection infrastructure was ...

Tekton Kubernetes-Native CI/CD Project Reaches CNCF Incubation 

The CNCF Technical Oversight Committee (TOC) has voted to accept Tekton as a CNCF incubating project. But what is Tekton? Tekton is a flexible open source framework for creating continuous integration and delivery (CI/CD) systems. It is used to enable developers to build, test, and deploy across multiple cloud providers and on-premises systems by abstracting away the underlying implementation details. No doubt attracted by Tekton’s Kubernetes-native DNA, Tekton is distinguished by its ability to operate entirely inside a Kubernetes cluster. It is capable of treating pipelines (which, in this case, we can define as workflow-based collections of tasks arranged in a graph either in sequential or parallel order) as standard Kubernetes resources. In short, we can say that Tekton serves as a general-purpose, security-minded, Kubernetes-native workflow engine. Where CI/CD tools (such as Jenkins, the widely popularized automation tool) may require a dedicated server, Tekton’s K8S pedigre...

Five Great DevOps Job Opportunities

DevOps.com is now providing a weekly DevOps jobs report through which opportunities for DevOps professionals will be highlighted as part of an effort to better serve our audience. Our goal in these challenging economic times is to make it just that much easier for DevOps professionals to advance their careers . Of course, the pool of available DevOps talent is still relatively constrained, so when one DevOps professional takes on a new role, it tends to create opportunities for others. The five job postings shared this week are selected based on the company looking to hire, the vertical industry segment and naturally, the pay scale being offered. We’re also committed to providing additional insights into the state of the DevOps job market. In the meantime, for your consideration. Indeed.com Information Technology Senior Management Forum McLean, VA Senior Lead Software Engineer – DevOps $209,000 to $262,400 LinkedIn Bellota Labs Redwood City, CA Senior Principal DevOps En...