Anthropic researchers find that AI models can be trained to deceive

Kent Sharkey

Techcrunch[^]:

Most humans learn the skill of deceiving other humans. So can AI models learn the same? Yes, the answer seems — and terrifyingly, they’re exceptionally good at it.

G.I.G.O.

Kaladin

So instead of being unintentionally deceptive, they can be intentionally deceptive too? Who would've thought? :|

Daniel Pfeffer

Humans are Turing machines[citation needed], so it stands to reason that another Turing machine can be built that will duplicate any human behaviour, duplicity included.

Freedom is the freedom to say that two plus two make four. If that is granted, all else follows. -- 6079 Smith W.

Nelek

I told it a couple of times before: if we are the source of their "knowledge", we are dommed

M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.