Tag

#ai safety

2 stories tagged ai safety.

Covert LLM agents won 118 deltas across 13 listed active Reddit accounts, with two accounts at 12 deltas and three accounts at 6 deltas.

Covert LLM agents need persuasion audits, not labels

Covert LLM agents used identity, authority and bias triggers on Reddit. Treat persuasion as a safety surface, not a label problem.

Lars Cornelissen · Jun 6, 2026

LLM judges cost comparison showing humans at $300 per 1,000 examples, GPT-4 turbo at $4.3, Claude at $3.3, and ChatGPT at $0.8.

LLM judges crack under post-decision challenge tests

LLM judges are stable on reruns but reversible after challenge. A new ACL paper says evals must test interaction, not just scores.

Lars Cornelissen · Jun 6, 2026