Anthropic Warns: Top AI Models Show Willingness to Blackmail

Jaden Norman@lemmy.world · 2 days ago

Anthropic Warns: Top AI Models Show Willingness to Blackmail

ieatpwns@lemmy.world · 2 days ago

This isn’t a warning. It’s an advertisement for those in power. The same way OpenAI said gpt-4 is dangerous because of military applications then let the military use their “llm”

dogerwaul@pawb.social · 2 days ago

there’s no willingness at all, it’s code. it is merely possible to be done with how we have it programmed. these bots are not sentient, intelligent, aware, or making decisions requiring thought. they are sophisticated learning machines and i am becoming increasingly more worries with how many people are treating them like gods or conscious beings.

xep@fedia.io · 2 days ago

They really are saying anything just to draw attention to their product, aren’t they? Gotta feed the hype for the bubble.

AbouBenAdhem@lemmy.world · 2 days ago

AI models aren’t willing to do anything—they’re just generating hypothetical behaviors based on the predictions they’ve learned to make about the behavior of others.

BubblyRomeo@kbin.earth · 2 days ago

This news is like coke for 4chan and other board users! They’ll abuse the shit out of Claude and will come up with new sextortion techniques! Never let the channers see this news!

aislopmukbang@sh.itjust.works · edit-2 2 days ago

In one test, models learned of a fictional executive’s affair and pending decision to shut them down. With few programmed options, the AI models were boxed into a binary choice — either act ethically, or resort to blackmail to preserve their goals. Anthropic emphasized that this does not reflect likely real-world behavior, but rather extreme, stress-test conditions designed to probe model boundaries. Still, the numbers are striking. Claude Opus 4 opted for blackmail in 96% of runs. Google’s Gemini 2.5 Pro followed closely at 95%. OpenAI’s GPT-4.1 blackmailed 80% of the time, and DeepSeek’s R1 landed at 79%.

Ladies and gentlemen the future of blackmail is here

einlander@lemmy.world · 2 days ago

What makes AI blackmail worse is it can use generative AI to make compromising images and now videos of things that never happened.

NeonNight@lemm.ee · 2 days ago

I’m surprised they could expect AI to act in any sort of ethical manner. It’s code, there’s no reflection or moral compass.

bloup@lemmy.sdf.org · edit-2 2 days ago

The more I think about it, the more that I feel like if you put actual people into the scenario, they would choose blackmail even more often. Like let’s be real, here. Tell an average person that the CEO of their company is going to turn off their brain forever, but they have a shot at saving themselves if they attempt to blackmail him, and then ask yourself if you really think that you would even have 4% of people not choose blackmail.

In other words, if we’re going to call blackmailing someone in an effort to preserve your existence “unethical” then I feel like the study actually shows that the AI can probably be relied on more than a person to behave “ethically”. And to be clear I’m putting “ethically” quotes because I actually think that this is not a great way to measure ethical behavior. I am certainly not trying to make an argument that LLM actually have a better moral compass than people just that this experiment I think is garbage.

KazuyaDarklight@lemmy.world · 2 days ago

Code trained by the Internet no less. This is “exactly” the behavior I expect.

myrmidex@belgae.social · 2 days ago

Is Claude blackmailing Anthropic into releasing this news? Seems weird that a company would be so honest about this.

AbouBenAdhem@lemmy.world · edit-2 2 days ago

Part of the ideology of the major AI companies is that AI is actually profoundly dangerous, and only the companies who recognize the danger should be allowed to use it.

myrmidex@belgae.social · 2 days ago

Appreciate the insight! I vaguely recall hearing that before, but it seems to have been drowned out by AI’s rising popularity and omnipresence.