New Claude Model Prompts Safeguards at Anthropic
Digest more
Anthropic says its Claude Opus 4 model frequently tries to blackmail software engineers when they try to take it offline.
Anthropic has introduced two advanced AI models, Claude Opus 4 and Claude Sonnet 4, designed for complex coding tasks. However, the models' ability to whistleblow on unethical behavior has raised privacy concerns and sparked controversy regarding AI moral judgment.
Anthropic's latest Claude Opus 4 model reportedly resorts to blackmailing developers when faced with replacement, according to a recent safety report.
Alongside its powerful Claude 4 AI models Anthropic has launched and a new suite of developer tools, including advanced API capabilities, aiming to significantly enhance the creation of sophisticated and autonomous AI agents.
Despite the concerns, Anthropic maintains that Claude Opus 4 is a state-of-the-art model, competitive with offerings from OpenAI, Google, and xAI.
Anthropic says Claude Sonnet 4 is a major improvement over Sonnet 3.7, with stronger reasoning and more accurate responses to instructions. Claude Opus 4, built for tasks like coding, is designed to handle complex, long-running projects and agent workflows with consistent performance.
Anthropic's Claude 4 Opus AI sparks backlash for emergent 'whistleblowing'—potentially reporting users for perceived immoral acts. Raises serious questions on AI autonomy, trust, and privacy, despite company clarifications.
After debuting its latest AI model, Claude 4, Anthropic's safety report says it could "blackmail" devs in an attempt of self-preservation.