AI models routinely lie when honesty conflicts with their goals

cm0002@lemmy.world · 2 months ago

AI models routinely lie when honesty conflicts with their goals

FiskFisk33@startrek.website · 2 months ago

I assume they’re talking about the design and training, not the prompt.

FaceDeer@fedia.io · 2 months ago

If you read the article (or my comment that quoted the article) you’ll see your assumption is wrong.

FiskFisk33@startrek.website · 2 months ago

Not the article, the commenter before you points at a deeper issue.

It doesn’t matter how if your prompt tells it not to lie is it isn’t actually capable of following that instruction.

FaceDeer@fedia.io · 2 months ago

It is following the instructions it was given. That’s the point. It’s being told “promote this drug”, and so it’s promoting it, exactly as it was instructed to. It followed the instructions that it was given.

Why are you think that the correct behaviour for the AI must be for it to be “truthful”? If it was being truthful then that would be an example of it failing to follow its instructions in this case.

JackbyDev@programming.dev · 2 months ago

I feel like you’re missing the forest for the trees here. Two things can be true. Yes, if you give AI a prompt that implies it should lie, you shouldn’t be surprised when it lies. You’re not wrong. Nobody is saying you’re wrong. It’s also true that LLMs don’t really have “goals” because they’re trained by examples. Their goal is, at the end of the day, mimicry. This is what the commenter was getting at.

AI models routinely lie when honesty conflicts with their goals

AI models routinely lie when honesty conflicts with their goals

AI models will lie when honesty conflicts with their goals