Anthropic Warns: Top AI Models Show Willingness to Blackmail

Jaden Norman@lemmy.world · 3 days ago

Anthropic Warns: Top AI Models Show Willingness to Blackmail

bloup@lemmy.sdf.org · edit-2 2 days ago

The more I think about it, the more that I feel like if you put actual people into the scenario, they would choose blackmail even more often. Like let’s be real, here. Tell an average person that the CEO of their company is going to turn off their brain forever, but they have a shot at saving themselves if they attempt to blackmail him, and then ask yourself if you really think that you would even have 4% of people not choose blackmail.

In other words, if we’re going to call blackmailing someone in an effort to preserve your existence “unethical” then I feel like the study actually shows that the AI can probably be relied on more than a person to behave “ethically”. And to be clear I’m putting “ethically” quotes because I actually think that this is not a great way to measure ethical behavior. I am certainly not trying to make an argument that LLM actually have a better moral compass than people just that this experiment I think is garbage.