Skip to main content

Claude’s Code Quality Conundrum Continues

A lot is going on at Anthropic. Access to the almost-fabled Mythos model remains restricted (despite some reports of unauthorized access), and nobody knows quite what is likely to happen or when in terms of its final rollout.

Developers, meanwhile, are left with their own challenges; last week’s “upgrade” to Opus 4.7 has left some software engineers already longing for a return to 4.6 with its less literal instruction interpretation and its perhaps less cautious use of safeguards and controls.

Then there’s the Claude quality conundrum in and of itself.

Root of the Problem?

Anthropic says it recognizes the fact that users are reporting that they are getting “worsened responses” over the past month. In answer to this, the organization confirms it has traced these reports to three separate changes that affected Claude Code, the Claude Agent SDK, and Claude Cowork.

The Claude API and the inference layer were not impacted.

All three issues have now been resolved as of April 20 (version 2.1.116), confirms Anthropic.

Promising to move ahead “differently,” the Claude team has gone to pains to explain how they will ensure similar issues are much less likely to happen again.

Again, just bringing up the reports from many developers suggesting that there has been a degradation in model performance at Anthropic (try Googling “claude code developers unhappy” and look for Reddit and HackerNoon as prime examples of what people are saying – add in Opus4.7 to that search if you want the real nitty gritty), the company has stated that it “never intentionally degrades our models” in a company blog.

“On March 4, we changed Claude Code’s default reasoning effort from high to medium to reduce the very long latency – enough to make the UI appear frozen – some users were seeing in high mode. This was the wrong tradeoff. We reverted this change on April 7 after users told us they’d prefer to default to higher intelligence and opt into lower effort for simple tasks. This impacted Sonnet 4.6 and Opus 4.6,” stated Anthropic.

Syncing Out Older Thinking

After this event, on March 26, the team shipped a change to clear Claude’s “older thinking” from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions.

A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. The team fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6.

“On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality and was reverted on April 20. This impacted Sonnet 4.6, Opus 4.6, and Opus 4.7,” stated Anthropic.

Aggregate Aggression

A lot is happening here, and it’s happening to a lot of moving parts concurrently. How that breaks down is a reality where each change affects a different slice of traffic on a different schedule – and that means that the aggregate effect looks like broad, inconsistent degradation.

Early reports of this were challenging to distinguish from normal variation in user feedback at first, and neither our internal usage nor subsequent evaluation exercise initially reproduced the issues identified.

Because this clearly isn’t the experience users should expect from Claude Code, as of yesterday at the time of writing (April 23), the company has reset usage limits for all subscribers.

One of the challenges here is that (perhaps obviously) the longer the model thinks, the better the output. Effort levels are how Claude Code lets users set that tradeoff – more thinking versus lower latency and fewer usage limit hits.

The Test-Time-Compute Curve

As we calibrate effort levels for our models, we take this tradeoff into account in order to pick points along the test-time-compute curve that give people the best range of options. In the product layer, we then choose which point along this curve we set as our default, and that is the value we send to the Messages API as the effort parameter; we then make the other options available via /effort,” confirms the team.

Looking to the future, we can expect Anthopic to get it in the neck on a fairly regular basis, often down to the fact that Claude Code is so widely used by the software application development cognoscenti. It won’t be hard to find the naysayers and anti-platform protesters saying bad things (Trump’s tech and cyber czar @DavidSacks on X is a fairly vitriolic stream if you’re sitting comfortably enough) and with an arguably less than effective ex-prime minister of Britain in the shape of Rishi Sunak on its advisory board, Athropic might do well to ask Claude itself what the current sentiment among its user base is.



from DevOps.com https://ift.tt/v3iu47f

Comments

Popular posts from this blog

Claude Code’s Ultraplan Bridges the Gap Between Planning and Execution

Planning a complex code change is hard enough. Reviewing it in a terminal window shouldn’t make it harder. Anthropic is addressing that friction with a new capability called Ultraplan, currently in research preview as part of Claude Code. The feature moves the planning phase of a coding task from your local terminal to the cloud — and gives developers a richer environment to review, revise, and approve a plan before a single line of code changes. It’s a small workflow shift with real practical value, especially for teams working on large-scale migrations, service refactoring, or anything that requires careful coordination before execution begins. How it Works Ultraplan connects Claude Code’s command-line interface (CLI) to a cloud-based session running in plan mode. When a developer triggers it — either by running /ultraplan followed by a prompt, typing the word “ultraplan” anywhere in a standard prompt, or choosing to refine an existing local plan in the cloud — Claude picks u...

Java 26 Arrives With AI Integration and a New Ecosystem Portfolio — What It Means for DevOps Teams

Oracle released Java 26 on March 17, 2026, and while every six-month release comes with its own set of improvements, this one carries a broader message: Java isn’t just keeping pace with the AI era — it’s actively positioning itself as the infrastructure layer where AI workloads will run. For DevOps teams managing large Java estates, that’s worth paying attention to. The Scale of What You’re Already Running Before getting into what’s new, it helps to remember what’s already in place. According to a 2025 VDC study, Java is the number one language for overall enterprise use and for cloud-native deployments. There are 73 billion active JVMs running today, with 51 billion of those in the cloud. That scale matters when you’re thinking about where AI fits in. Most of the systems where agentic AI will eventually operate — transactional platforms, backend services, data pipelines — are already running on Java. The question for DevOps teams isn’t whether to adopt Java for AI. It’s how to ...

Claude Code Can Now Run Your Desktop

For most of its short life, Claude has lived inside a chat window. You type, it responds. That model is changing fast. Anthropic recently expanded Claude Code and Claude Cowork with a new computer use capability that lets the AI directly control your Mac or Windows desktop — clicking, typing, opening applications, navigating browsers, and completing workflows on your behalf. It’s available now as a research preview for Pro and Max subscribers. The short version: Claude can now do things at your desk while you’re somewhere else. How it Actually Works Claude doesn’t reach for the mouse first. It prioritizes existing connectors to services like Slack or Google Calendar. When no connector is available, it steps up to browser control. Only when those options don’t apply does it take direct control of the desktop — navigating through UI elements the way a human would. Claude always requests permission before accessing any new application, and users can halt operations at any point. T...