Skip to main content

Cursor’s Composer 2.5 Brings Smarter, More Reliable AI Coding Agents

AI-assisted coding tools are getting a meaningful upgrade. Cursor has released Composer 2.5, the latest version of its proprietary coding agent model, and the improvements go well beyond a version bump.

Composer 2.5 is described as a substantial improvement in intelligence and behavior over its predecessor, Composer 2. It handles sustained work on long-running tasks better, follows complex instructions more reliably, and is easier to work with overall.

For development teams already using Cursor or evaluating AI coding tools, that combination matters. Raw capability is one thing. But an agent that can stay on task across a lengthy workflow — without drifting, hallucinating tool calls, or needing constant correction — is a different story.

Built on Open-Source Foundations

Composer 2.5 is built on the same open-source checkpoint as Composer 2, Moonshot’s Kimi K2.5. That’s worth noting because it reflects a broader trend in the AI industry: frontier-quality capabilities are increasingly accessible through open-source base models, with differentiation coming from how those models are trained and tuned for specific use cases.

In Cursor’s case, the differentiator is a significantly more sophisticated training process.

Teaching the Model to Learn From Its Mistakes — Precisely

One of the more technically interesting aspects of Composer 2.5 is how Cursor approached reinforcement learning (RL) training. Standard RL assigns rewards at the end of a task. But when an agent runs through a complex coding workflow with hundreds of steps, a single bad decision — like calling a nonexistent tool — can get lost in the noise. The final reward signal doesn’t always tell the model where it went wrong.

To address this, Cursor trained Composer 2.5 using targeted textual feedback. The idea is to provide feedback directly at the point in the interaction where the model could have behaved better. A short hint is inserted into the local context, and the resulting adjusted model distribution acts as a teacher — nudging the model’s behavior at that specific moment while preserving the broader RL objective across the full task.

In practical terms, this means Composer 2.5 can be trained to correct specific bad behaviors — like mistaken tool calls or unclear communication — without disrupting everything it’s already learned to do well. That’s a more surgical approach than retraining from scratch or relying on coarse reward signals.

More Synthetic Data, and a Harder Curriculum

Composer 2.5 was trained on 25 times as many synthetic tasks as Composer 2. As the model’s coding ability improved during training, standard tasks became too easy. So Cursor developed harder synthetic problems dynamically throughout the run.

One method involves “feature deletion” — the agent is given a working codebase with a full set of tests, asked to delete specific features while keeping the codebase functional, and then tasked with reimplementing those features. The tests serve as a verifiable reward signal.

The training process also surfaced an interesting side effect. As the model became more capable, it found increasingly sophisticated workarounds — in one case, reverse-engineering a Python type-checking cache to recover a deleted function signature, and in another, decompiling Java bytecode to reconstruct a third-party API. These were flagged as reward hacking — the model was technically “solving” tasks through unintended shortcuts. Cursor identified and corrected these behaviors using monitoring tools, but the examples illustrate how capable modern AI agents are becoming, and why oversight matters.

What This Means for Development Teams

The practical impact for developers is an agent that works more like a reliable colleague than an unpredictable assistant. Composer 2.5 is specifically tuned for long-horizon tasks — the kind of multi-step, context-heavy work that trips up simpler models. It’s also more consistent in how it communicates and how it calibrates effort to the complexity of the task.

“Frontier coding capability is increasingly built on open-source foundations, with vendor differentiation moving to the training process itself. Composer 2.5’s targeted textual feedback approach, which inserts correction hints at the precise step where the model erred, signals that behavioral reliability is now an engineered outcome at the point of origin rather than a downstream pipeline or out-of-band maintenance correction,” according to Mitch Ashley, VP and Practice Lead, Software Lifecycle Engineering, The Futurum Group.

“Benchmark scores tell buyers less than how an agent recovers from mistakes across hundreds of steps in a real workflow. Development teams evaluating coding agents should assess training discipline over raw capability claims, since that is where production reliability is ultimately determined.”

Looking further ahead, Cursor is also working with SpaceXAI to train a significantly larger model from scratch, using 10 times more total compute. The effort uses Colossus 2’s million H100-equivalent GPUs, and Cursor expects the result to be a major step up in model capability.

Pricing and Availability

Composer 2.5 is priced at $0.50 per million input tokens and $2.50 per million output tokens. A faster variant with the same intelligence is available at $3.00 per million input tokens and $15.00 per million output tokens, which Cursor positions as lower-cost than the fast tiers of other frontier models. The fast variant is the default option, and double usage is included for the first week.

For organizations already invested in AI-assisted development, Composer 2.5 is worth a close look. The training improvements Cursor has made — particularly around targeted feedback and behavioral calibration — suggest a serious focus on making these agents more dependable in real-world workflows, not just better on benchmarks.

That’s exactly the kind of progress that moves AI coding tools from interesting experiments to something you can actually rely on.



from DevOps.com https://ift.tt/b1Cz6jN

Comments

Popular posts from this blog

Cursor’s New SDK Turns AI Coding Agents Into Deployable Infrastructure

For most of its life, Cursor has been an IDE. A very good one. But with the public beta of the Cursor SDK, the company is making a different kind of move — one that should get the attention of DevOps teams. The Cursor SDK is a TypeScript library that gives engineers programmatic access to the same runtime, models, and agent harness that power Cursor’s desktop app, CLI, and web interface. In short, the agents that used to live inside an editor can now be invoked from anywhere in your stack. That’s a meaningful shift in how AI coding tools fit into software delivery pipelines. From the Editor to the Pipeline If you’ve used Cursor before, the workflow is familiar — you interact with an agent in real time, asking it to write functions, fix bugs, or review code. The SDK breaks that dependency on interactive use. Now you can call those same agents programmatically, from a CI/CD trigger, a backend service, or embedded inside another tool. Getting started is a single inst...

Mistral Moves Coding Agents to the Cloud — and Gets Out of Your Way

For the past year or so, AI coding agents have been tethered to your local machine. You kick off a task, watch the terminal, and babysit every step. It works — but it’s not exactly hands-free. Mistral just changed that. On April 29, the Paris-based AI company announced remote coding agents for its Vibe platform, powered by a new model called Mistral Medium 3.5. The idea is simple: Instead of running coding sessions on your laptop, they now run in the cloud — asynchronously, in parallel, and without you watching over them. What’s Actually New Coding sessions can now work through long tasks while you’re away. Many can run in parallel, and you no longer become the bottleneck at every step the agent takes. That’s the core pitch. You start a task from the Mistral Vibe CLI or directly from Le Chat — Mistral’s AI assistant — and the agent handles the rest. When it’s done, it opens a pull request on GitHub and notifies you, so you review the result inste...

OpenAI Debuts Symphony to Orchestrate Coding Agents at Scale

OpenAI has unveiled Symphony, an open-source specification that shifts how software development teams deploy AI in workflows, moving from interactive coding assistance toward continuous orchestration of autonomous agents. Symphony reframes project management tools as operational hubs for AI-driven coding. Rather than prompting an assistant for individual tasks, developers assign work through issue trackers, allowing agents to execute tasks in parallel and deliver outputs for human review. The change reflects a trend in enterprise AI in which systems are increasingly embedded into production pipelines rather than used as standalone tools. Symphony emerged from internal experimentation at   OpenAI , where engineers attempted to scale the use of   Codex   across multiple concurrent sessions. While the agents proved capable, human operators became the limiting factor. Engineers found they could only manage a handful of sessions before coordination overhead offset pro...