Skip to main content

Co-Developing an AI Native Observability Platform  

observability, 2.0, developers, observability, datadog, your, observability, customers, blind spots, telemetry, New Relic, Observe, Gen AI, Generative AI, modern, applications, risk, observability, AI, unified observability, binoculars
observability, 2.0, developers, observability, datadog, your, observability, customers, blind spots, telemetry, New Relic, Observe, Gen AI, Generative AI, modern, applications, risk, observability, AI, unified observability, binoculars

As AI capabilities continue to evolve, AI is becoming central to managing the growing complexity of distributed, hybrid enterprise environments, enabling more effective analysis, correlation, and automation across interconnected systems.  

Traditional infrastructure and specifically network monitoring approaches, often built around siloed tools and static thresholds, struggle to keep pace with the scale, velocity, and interdependencies of modern systems. Further blurring the boundaries between network, application, and infrastructure domains makes it harder to isolate root causes and maintain operational resilience. In this context, AIOps platforms have emerged as one response to the growing need for integrated observability, automation, and data-driven decision-making. 

At AI Field Day, Selector AI presented an AIOps platform, which can be considered a foundation for co-creating more adaptive and data-driven network operations. Rather than positioning it purely as a product choice, it embraces the SaaS approach, considering professional services as part of the offering, coupled with the product features and encouraging a co-development approach towards the platform instance for customers. 

Demonstrations from the AI Field Day highlight the capability of full-stack observability with a data-centric approach, where data becomes the core part of the stack. Selector’s strength lies in its data-centric foundation, ingesting diverse, multi-domain sources metrics, logs, configs, alerts, and topology into a unified analytics layer. Unlike model-first tools, it prioritizes raw telemetry correlation via ML before layering AI, creating a “single source of truth” that slashes alert fatigue and supports hybrid/cloud environments without siloed dashboards. A unified data approach combined with the co-development of the specific platform instance, provides a more deterministic way of visualizing the problem, combined with causal analysis, which can help identify root causes more efficiently.  

Another strategic feature embedded within the platform is the Network Language Model, built on a fine-tuned Model with vast networking telemetry data, which bridges natural language queries to complex ops tasks. It understands domain-specific terms (e.g., interface connectivity, routing paths) and powers Slack/Teams chats. This capability provides Selector AI advantage to advance beyond basic observability with AI agent workflows that enable autonomous, explainable network operations via its agent framework. These agents leverage Retrieval-Augmented Generation (RAG) with the Network Language Model to process unified telemetry, then trigger actions. 

Striking comparison with building an in-house platform offers customization but typically requires significant investment often on the order of 18 to 24 months, multimillion-dollar budgets, and dedicated engineering teams. In addition, ongoing maintenance can increase as telemetry volumes grow and machine learning models require continuous tuning. Over time, internally developed systems may accumulate technical debt, particularly if they struggle to keep pace with evolving data and operational complexity. In contrast, purchasing a platform such as Selector, where organizations can engage in a co-development approach, may reduce initial development effort and accelerate deployment, with integrated capabilities like cross-domain correlation, incident summarization, and extensibility through a partner ecosystem. Another highlight is that the Selection instance is per customer, which means that there is no need for additional overheads.  

The role of AI in operations is also evolving. Rather than optimizing capabilities like site reliability, the approach tends to shift the focus toward higher-level validation and decision-making. This creates a collaborative model where human expertise, operational data, and machine intelligence reinforce each other. 

Adoption of such platforms often benefits from a phased approach. Initial efforts may focus on a limited proof of value, targeting a small number of critical services to measure improvements in alert reduction and incident response times. Subsequent phases can expand telemetry ingestion, introduce agentic workflows, and automate routine operational tasks, supported by cross-functional governance structures. Over time, organizations may extend capabilities toward predictive operations, capacity planning, and broader automation, while continuously evaluating outcomes against defined performance and cost metrics. 

Lastly, co-creation allows for the customization of AI models and analytics to fit unique customer needs. This “mass customization” enables teams to create specific actionable insights rather than relying on generic, “one-size-fits-all” heuristics, according to Selector’s blog.  These elements combine for agentic, self-healing networks, aligning with AI Field Day 8 themes of production-scale inference and infrastructure evolution. 

From a strategic perspective, platforms like Selector can be viewed less as a standalone product and more as enablers of operational evolution. The long-term value depends on how effectively organizations integrate them into their workflows, align them with business objectives, and build internal capabilities around them. 



from DevOps.com https://ift.tt/ngtr8b9

Comments

Popular posts from this blog

Cursor’s New SDK Turns AI Coding Agents Into Deployable Infrastructure

For most of its life, Cursor has been an IDE. A very good one. But with the public beta of the Cursor SDK, the company is making a different kind of move — one that should get the attention of DevOps teams. The Cursor SDK is a TypeScript library that gives engineers programmatic access to the same runtime, models, and agent harness that power Cursor’s desktop app, CLI, and web interface. In short, the agents that used to live inside an editor can now be invoked from anywhere in your stack. That’s a meaningful shift in how AI coding tools fit into software delivery pipelines. From the Editor to the Pipeline If you’ve used Cursor before, the workflow is familiar — you interact with an agent in real time, asking it to write functions, fix bugs, or review code. The SDK breaks that dependency on interactive use. Now you can call those same agents programmatically, from a CI/CD trigger, a backend service, or embedded inside another tool. Getting started is a single inst...

Mistral Moves Coding Agents to the Cloud — and Gets Out of Your Way

For the past year or so, AI coding agents have been tethered to your local machine. You kick off a task, watch the terminal, and babysit every step. It works — but it’s not exactly hands-free. Mistral just changed that. On April 29, the Paris-based AI company announced remote coding agents for its Vibe platform, powered by a new model called Mistral Medium 3.5. The idea is simple: Instead of running coding sessions on your laptop, they now run in the cloud — asynchronously, in parallel, and without you watching over them. What’s Actually New Coding sessions can now work through long tasks while you’re away. Many can run in parallel, and you no longer become the bottleneck at every step the agent takes. That’s the core pitch. You start a task from the Mistral Vibe CLI or directly from Le Chat — Mistral’s AI assistant — and the agent handles the rest. When it’s done, it opens a pull request on GitHub and notifies you, so you review the result inste...

OpenAI Debuts Symphony to Orchestrate Coding Agents at Scale

OpenAI has unveiled Symphony, an open-source specification that shifts how software development teams deploy AI in workflows, moving from interactive coding assistance toward continuous orchestration of autonomous agents. Symphony reframes project management tools as operational hubs for AI-driven coding. Rather than prompting an assistant for individual tasks, developers assign work through issue trackers, allowing agents to execute tasks in parallel and deliver outputs for human review. The change reflects a trend in enterprise AI in which systems are increasingly embedded into production pipelines rather than used as standalone tools. Symphony emerged from internal experimentation at   OpenAI , where engineers attempted to scale the use of   Codex   across multiple concurrent sessions. While the agents proved capable, human operators became the limiting factor. Engineers found they could only manage a handful of sessions before coordination overhead offset pro...