Categories: Tech & Ai

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts


Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that training on “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026



Source link

Abigail Avery

Share
Published by
Abigail Avery

Recent Posts

BTC, XRP on the Move Amid Trump’s Latest Message on Iran: Details

XRP's breakout actually appears a lot stronger this time. Bitcoin, Ripple’s token, Solana’s SOL,…

19 minutes ago

50% Off Blue Apron Promo Codes | May 2026

We’ve been testing (aka eating) Blue Apron for our guide to the best meal kit…

29 minutes ago

XRP’s next bottom? Analysts watch $0.93 and $1.45

Ripple’s native token (XRP) traded near $1.42 on May 10, with a market cap of…

1 hour ago

Ex-Goldman Strategist Predicts a Massive Breakout for the Brazilian Real

Key TakeawaysDriven by US-Iran tensions, Robin Brooks notes the undervalued Brazilian real will next surge…

2 hours ago

JPMorgan raises KOSPI bull case target to 8,500 on memory chip boom, and crypto may feel the squeeze

JPMorgan just bumped its bullish target for South Korea’s benchmark KOSPI index to 8,500, a…

3 hours ago

AI Agents, Data Breaches, and Workforce Shifts Define This Week in Tech

See what you missed in Daily Tech Insider from May 4–8. The post AI Agents,…

3 hours ago