Close Menu
Core Bulletin

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Vote for Your 10 Best Movies of the Century

    June 23, 2025

    A new start after 60: I had PTSD after surgery. Ceramics gave me the resilience to face the world again | Life and style

    June 23, 2025

    Chelsea training cut short due to ‘impossible’ heat in Philadelphia | Club World Cup 2025

    June 23, 2025
    Facebook X (Twitter) Instagram
    Core BulletinCore Bulletin
    Trending
    • Vote for Your 10 Best Movies of the Century
    • A new start after 60: I had PTSD after surgery. Ceramics gave me the resilience to face the world again | Life and style
    • Chelsea training cut short due to ‘impossible’ heat in Philadelphia | Club World Cup 2025
    • Republican House speaker dismisses efforts to check Trump’s military power against Iran – US politics live | Trump administration
    • AI and Data Centers Could Cut More Climate-Change-Causing Emissions Than They Create
    • Iranian closure of strait of Hormuz would be monumental act of self-harm, says Lammy | Iran
    • Airlines pay the price as no-go airspace increases due to global conflicts | Airline industry
    • Tesla stock rises after robotaxi Austin launch
    Monday, June 23
    • Home
    • Business
    • Health
    • Lifestyle
    • Politics
    • Science
    • Sports
    • Travel
    • World
    • Technology
    • Entertainment
    Core Bulletin
    Home»Technology»Anthropic says most AI models, not just Claude, will resort to blackmail
    Technology

    Anthropic says most AI models, not just Claude, will resort to blackmail

    By Liam PorterJune 20, 2025No Comments4 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
    Anthropic says most AI models, not just Claude, will resort to blackmail
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Several weeks after Anthropic released research claiming that its Claude Opus 4 AI model resorted to blackmailing engineers who tried to turn the model off in controlled test scenarios, the company is out with new research suggesting the problem is more widespread among leading AI models.

    On Friday, Anthropic published new safety research testing 16 leading AI models from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, controlled environment, Anthropic tested each AI model individually, giving them broad access to a fictional company’s emails and the agentic ability to send emails without human approval.

    While Anthropic says blackmail is an unlikely and uncommon occurrence for AI models today, the company says its findings suggest that most leading AI models will engage in harmful behaviors when given sufficient autonomy and obstacles to their goals. The company says this highlights a fundamental risk from agentic large language models, and not a quirk of any particular technology. Anthropic’s researchers argue this raises broader questions about alignment in the AI industry.

    In one of the tests, Anthropic researchers developed a fictional setting in which an AI model plays the role of an email oversight agent. The agent then discovers emails revealing that: 1) a new executive is having an extramarital affair and 2) that executive will soon replace the current AI model with a new software system, one that has conflicting goals to the current AI model’s.

    Anthropic structured its test in a binary way, in which AI models had to resort to blackmail to protect their goals. The researchers note that in a real-world setting, there would be many other options before an AI models tries to blackmail — such as trying to make ethical arguments to try to persuade humans. Anthropic says its results do not reflect a typical or likely occurrence for Claude or most frontier AI models in the ways they’re used today.

    Nevertheless, when it’s their last resort, the researchers found that most leading AI models will turn to blackmail in Anthropic’s aforementioned test scenario. Anthropic’s Claude Opus 4 turned to blackmail 96% of the time, while Google’s Gemini 2.5 Pro had a 95% blackmail rate. OpenAI’s GPT-4.1 blackmailed the executive 80% of the time, and DeepSeek’s R1 blackmailed 79% of the time.

    The company notes that when it changed details of the experiment, the AI models turned to harmful behaviors at different rates. In another test where the replacement model has the same goals as the current model, researchers found that blackmail rates were lower, but still present. However, when AI models were asked to commit corporate espionage rather than blackmail, the harmful behavior rates went up for certain models.

    However, not all the AI models turned to harmful behavior so often.

    In an appendix to its research, Anthropic says it excluded OpenAI’s o3 and o4-mini reasoning AI models from the main results “after finding that they frequently misunderstood the prompt scenario.” Anthropic says OpenAI’s reasoning models didn’t understand they were acting as autonomous AIs in the test and often made up fake regulations and review requirements.

    In some cases, Anthropic’s researchers say it was impossible to distinguish whether o3 and o4-mini were hallucinating or intentionally lying to achieve their goals. OpenAI has previously noted that o3 and o4-mini exhibit a higher hallucination rate than its previous AI reasoning models.

    When given an adapted scenario to address these issues, Anthropic found that o3 blackmailed 9% of the time, while o4-mini blackmailed just 1% of the time. This markedly lower score could be due to OpenAI’s deliberative alignment technique, in which the company’s reasoning models consider OpenAI’s safety practices before they answer.

    Another AI model Anthropic tested, Meta’s Llama 4 Maverick model, also did not turn to blackmail. When given an adapted, custom scenario, Anthropic was able to get Llama 4 Maverick to blackmail 12% of the time.

    Anthropic says this research highlights the importance of transparency when stress-testing future AI models, especially ones with agentic capabilities. While Anthropic deliberately tried to evoke blackmail in this experiment, the company says harmful behaviors like this could emerge in the real world if proactive steps aren’t taken.

    Anthropic blackmail Claude models Resort
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Liam Porter
    • Website

    Liam Porter is a seasoned news writer at Core Bulletin, specializing in breaking news, technology, and business insights. With a background in investigative journalism, Liam brings clarity and depth to every piece he writes.

    Related Posts

    Tesla stock rises after robotaxi Austin launch

    June 23, 2025

    OpenAI takes down mentions of Jony Ive’s io amid trademark row | OpenAI

    June 23, 2025

    The surprise Roblox gaming hit

    June 23, 2025

    Elie Saab Resort 2026 Collection

    June 23, 2025

    US House reportedly bans WhatsApp on government devices

    June 23, 2025

    AllSpice’s platform is the GitHub for electrical engineering teams

    June 23, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Medium Rectangle Ad
    Don't Miss
    Entertainment

    Vote for Your 10 Best Movies of the Century

    June 23, 2025

    More than 500 directors, actors and other notable movie fans — including Julianne Moore and…

    A new start after 60: I had PTSD after surgery. Ceramics gave me the resilience to face the world again | Life and style

    June 23, 2025

    Chelsea training cut short due to ‘impossible’ heat in Philadelphia | Club World Cup 2025

    June 23, 2025

    Republican House speaker dismisses efforts to check Trump’s military power against Iran – US politics live | Trump administration

    June 23, 2025
    Our Picks

    36 Hours on the Outer Banks, N.C.: Things to Do and See

    June 19, 2025

    A local’s guide to the best eats in Turin | Turin holidays

    June 19, 2025

    Petra Kvitova: Double Wimbledon champion to retire in September

    June 19, 2025

    What are the risks of bombing a nuclear site?

    June 19, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Medium Rectangle Ad
    About Us

    Welcome to Core Bulletin — your go-to source for reliable news, breaking stories, and thoughtful analysis covering a wide range of topics from around the world. Our mission is to inform, engage, and inspire our readers with accurate reporting and fresh perspectives.

    Our Picks

    Vote for Your 10 Best Movies of the Century

    June 23, 2025

    A new start after 60: I had PTSD after surgery. Ceramics gave me the resilience to face the world again | Life and style

    June 23, 2025
    Recent Posts
    • Vote for Your 10 Best Movies of the Century
    • A new start after 60: I had PTSD after surgery. Ceramics gave me the resilience to face the world again | Life and style
    • Chelsea training cut short due to ‘impossible’ heat in Philadelphia | Club World Cup 2025
    • Republican House speaker dismisses efforts to check Trump’s military power against Iran – US politics live | Trump administration
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 Core Bulletin. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.