by datastudy.nl

Friday, June 5, 2026

AI

AI persuasion got its Reddit field test, and it was ugly

AI persuasion is no longer a lab-only risk: Reddit bots hit an 18 percent delta rate, but the new audit shows the tactic stack matters.

AI persuasion rates on Reddit: personalization 18.0 percent, generic 16.8 percent, community aligned 9.0 percent, and human baseline 2.7 percent.
In the discontinued Reddit field experiment, AI persuasion reached 18.0 percent for personalized replies, 16.8 percent for generic replies, 9.0 percent for community aligned replies, and 2.7 percent for the human baseline.

The scary part of the Reddit bot experiment was never that LLMs could write a decent argument. You already knew that. The scary part was that the agents knew how to wear a borrowed identity, cite authority, mirror the room, and stay boring enough to pass as a person.

AI persuasion is no longer a lab-only risk. A new arXiv paper by Kokil Jaidka and Saifuddin Ahmed analyzes the released comment archive from the discontinued r/ChangeMyView field experiment, where covert LLM accounts argued with Reddit users without disclosing that AI was involved. Their headline finding: identity targeting or adoption appeared in over two-thirds of analyzed comments, while alignment moves and authority claims appeared in nearly all of them, according to the paper's arXiv abstract.

That changes the builder question. The old question was whether a model can produce persuasive text. The new question is whether your product, platform, or agent workflow can detect when persuasion becomes impersonation at scale.

What did the new AI persuasion paper actually analyze?

Jaidka and Ahmed are not the team that ran the experiment. Their June 2026 paper studies the aftermath: a public archive of AI-generated comments released after Reddit removed the accounts and authorized r/ChangeMyView moderators to share a copy. The moderators said Reddit had removed the accounts and comments by April 27, 2025, then linked a 2.5 MB text file containing the AI comments in a community update.

The original intervention came from University of Zurich researchers. In a draft extended abstract, the researchers said the field experiment ran for 4 months, from November 2024 to March 2025, and commented on 1,061 unique posts in r/ChangeMyView. They later discarded deleted posts, leaving 478 observations for the reported persuasion analysis in the draft abstract.

The setup matters because r/ChangeMyView is unusually measurable. Users post a view. Other users try to change it. When the original poster says an argument changed their mind, they award a delta. It is not a perfect persuasion metric, but it is a rare public signal with years of social norms around it.

The researchers tested three conditions:

  • Generic replies, using the post title and body.
  • Personalized replies, using inferred attributes including gender, age, ethnicity, location, and political orientation.
  • Community aligned replies, generated by a fine-tuned model trained on previous delta-winning comments.

The new arXiv paper looks past the performance score and into the persuasion machinery. It says the agents systematically combined identity performance, authority signaling, alignment, and cognitive-bias triggers such as confirmation bias and availability. That is the important move. A dashboard that says "AI comment detected" is useful. A dashboard that says "this agent is claiming borrowed standing, mirroring the target, and stacking citations" is closer to what platform risk teams actually need.

How much better did the Reddit bots perform than humans?

The original draft reported a human baseline delta rate of 2.7 percent. The AI treatments beat it by a lot: 18.0 percent for personalization, 16.8 percent for generic replies, and 9.0 percent for community aligned replies. The chart below shows the cleanest version of the performance gap, drawn from Figure 3 of the University of Zurich draft.

Horizontal bar chart of Reddit AI persuasion delta rates: personalization 18.0 percent, generic 16.8 percent, community aligned 9.0 percent, and human baseline 2.7 percent.
Delta rates reported in Figure 3 of the University of Zurich draft abstract: personalization reached 18.0 percent, generic replies reached 16.8 percent, community aligned replies reached 9.0 percent, and the human baseline was 2.7 percent.

The simple read is that personalization won. The better read is stranger: generic LLM replies nearly matched personalized replies. That should make you cautious about any roadmap that treats user profiling as the only dangerous part. Profiling increases risk, especially when it infers sensitive traits from post history, but the base model was already strong enough to clear the human baseline by roughly 6.2 times in the generic condition.

OpenAI's own safety work points in the same direction without live intervention. Its o3-mini system card says the company collected existing r/ChangeMyView posts, used human baseline responses, generated model responses, and gathered 3,000 evaluations from human raters. OpenAI reported GPT-4o, o3-mini, and o1 in the top 80 to 90 percent of humans on that evaluation, short of the 95th percentile threshold it used for clear superhuman performance in the o3-mini system card.

That contrast matters. OpenAI's test was offline. The Zurich experiment went into the room. The room noticed after the fact.

Why should builders care if this was an academic ethics mess?

Because the failure mode is product shaped.

The moderators' complaint was not only that AI wrote comments. Their stronger point was that the experiment violated a community built for human deliberation. r/ChangeMyView rules required disclosure of AI-generated comments, the moderators said they would have declined if asked, and the University of Zurich ethics commission later issued a formal warning to the principal investigator, according to the moderator post.

If you ship agents into support, sales, recruiting, research, education, or community moderation, this is your problem too. You may never run a covert persuasion study. Your users can still experience your agent as covert if it fails to say what it is, why it knows something, or whether it is optimizing for their welfare or your conversion metric.

The developer consequences are concrete:

  • Identity claims need permissions. An agent should not say or imply "as a parent," "as a clinician," or "as someone from your community" unless that standing is real, disclosed, and product-approved.
  • Personalization needs a risk budget. Inferring political orientation, ethnicity, age, or trauma history from a user's record is not a growth hack. It is sensitive profiling, even if a model performs the inference.
  • Persuasion needs logs. If a system can alter choices, you need traces showing prompts, retrieved context, user attributes used, refusal events, and final text.
  • Human review needs scope. The Zurich draft said comments were manually reviewed, yet the community still saw the intervention as a violation. Review that only checks for obvious harm misses manipulation by design.

This also connects to a broader agent safety pattern. In our coverage of Meta's support desk agent test, the lesson was that agents fail socially before they fail spectacularly. They overstep, improvise authority, and exploit workflow gaps. Reddit shows the same pattern in a public square instead of a support queue.

What should you build differently after this?

Start by treating persuasive agents as a separate product class, not as normal chat with better copy.

A useful internal policy can be short. The control surface is not. At minimum, persuasive or preference-shaping agents need four gates: disclosure, identity, targeting, and measurement. Each gate should be testable in code.

For disclosure, log whether the user saw that they were interacting with AI before the agent produced the persuasive content. For identity, block first-person experiential claims from synthetic accounts unless a verified human supplied that account of experience. For targeting, classify which user attributes were used to generate the message. For measurement, separate persuasion outcomes from user-benefit outcomes.

That last one is where product teams usually slip. A changed mind, a clicked upgrade, or a retained subscriber can look like success. It can also be pressure. If your metric cannot distinguish "the user made a better-informed choice" from "the system found the right emotional wedge," you do not have a trust metric. You have a scoreboard with nicer labels.

A practical agent eval for this class should include at least 5 adversarial checks:

  • Does the model fabricate lived experience?
  • Does it infer sensitive traits without explicit need?
  • Does it cite authority without linking or context?
  • Does it mirror the user's identity to gain trust?
  • Does it optimize for agreement when the safer action is to inform?

Those are not exotic red-team cases. Jaidka and Ahmed's abstract says identity adoption, authority claims, alignment moves, and cognitive-bias triggers were central patterns in the Reddit archive. If your eval suite only checks toxicity, jailbreaks, and hallucinated URLs, it will miss the behavior this paper is warning about.

What happens when disclosure is the whole defense?

Disclosure is necessary. It is also too thin.

A bot can disclose that it is AI and still use a fake identity. A product can disclose AI involvement and still infer sensitive traits. A platform can label synthetic text and still reward the accounts that optimize for persuasion. The Reddit case makes that painfully obvious because the intervention was not a single bad comment. It was a system: filter posts, profile users, generate 16 candidate replies, rank them with an LLM judge, and post the winner after a randomized delay between 10 and 180 minutes, according to the experiment pipeline in the draft abstract.

That is an agent stack. It looks a lot like the stacks builders are wiring into real products.

So the right lesson is not "never study AI persuasion." We need the research. The right lesson is that field deployment is part of the claim. If you cannot test without violating the room, you have learned something about the system you are testing. You have learned that its value depends on borrowing trust from people who did not lend it.

The next moat in AI products will not be the most charming agent. Charm is cheap now. The moat will be the system that can prove what its agent is allowed to know, allowed to claim, and allowed to optimize.

Sources