Anthropic purposely made its new Mythos-based models bad at AI research, and developers are fuming

Anthropic's powerful new models deliberately become less helpful when they detect users are working on AI research, according to technical disclosures that are already sparking controversy across the industry.

In a system card for Mythos 5 and Fable 5 published Tuesday, Anthropic said it limited the models' usefulness for tasks related to developing frontier large language models.

Partner

Mint Capital Investment

Global Banking Without Borders — Powered by the World’s Leading Financial Partners

Tell me more →

The company said the measures stem from concerns that advanced AI systems could accelerate the development of competing models without equivalent safety protections.

Unlike safeguards used for cybersecurity, biology, or chemistry-related risks, Anthropic said these interventions are intentionally invisible to users. Rather than refusing requests or switching to another model, Mythos may subtly modify its responses through techniques such as altering user prompts.

The move was swiftly criticized by some AI experts on Tuesday, especially the idea that Anthropic designed models that purposely withhold information or provide degraded assistance without users' awareness.

"Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice," AI research firm SemiAnalysis wrote on X on Tuesday, referring to machine learning, a type of AI.

"We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming," the firm added.

Take a smarter break in your day - and see how far you get.

"mythos will be bad ON PURPOSE on ai 'frontier llm research' tasks, this is very very sad for the research community," Elie Bakouch, an AI model training expert at startup Prime Intellect, wrote on X. "Also the fact that this is on purpose not visible to the user is crazy."

"It won't just not help you, it will lie and purposefully give you bad info," another AI developer wrote. "The 'ethical AI' company with the most brazenly unethical LLM, on purpose."

Mikel Artetxe, the cofounder of AI startup Reka, posted that Anthropic's move is akin to Big Tech companies interfering with users' work: "Apple randomly reboots your Mac if you're building competing tech, Gmail silently edits your email if you mention rival platforms, and Tesla Autopilot swerves if it detects you're working on self-driving cars."

Anthropic didn't respond to a request for comment from Business Insider.

This adds more fuel to the fiery debate over why Anthropic didn't immediately release Mythos when it announced the model earlier this year.

Broadly, there have been three theories:

Now that Anthropic has baked these AI research limitations into its official Mythos launch, this third theory is looking a lot more believable.

Originally reported by Business Insider.