AI models routinely lie when honesty conflicts with their goals

cm0002@lemmy.world · 5 months ago

AI models routinely lie when honesty conflicts with their goals

WanderingThoughts@europe.pub · 5 months ago

as they’ve been designed to

Well, designed is maybe too strong a term. It’s more like stumbling on something that works and expand from there. It’s all still build on the fundaments of the nonsense generator that was chatGPT 2.

FaceDeer@fedia.io · 5 months ago

Given how dramatically LLMs have improved over the past couple of years I think it’s pretty clear at this point that AI trainers do know something of what they’re doing and aren’t just randomly stumbling around.

WanderingThoughts@europe.pub · 5 months ago

A lot of the improvement came from finding ways to make it bigger and more efficient. That is running into the inherent limits, so the real work with other models just started.

Natanael@infosec.pub · 5 months ago

And from reinforcement learning (specifically, making it repeat tasks where the answer can be computer checked)

AI models routinely lie when honesty conflicts with their goals

AI models routinely lie when honesty conflicts with their goals

AI models will lie when honesty conflicts with their goals