Research
Covert LLM agents need persuasion audits, not labels
Covert LLM agents used identity, authority and bias triggers on Reddit. Treat persuasion as a safety surface, not a label problem.
2 stories tagged ai safety.
Covert LLM agents used identity, authority and bias triggers on Reddit. Treat persuasion as a safety surface, not a label problem.
LLM judges are stable on reruns but reversible after challenge. A new ACL paper says evals must test interaction, not just scores.