by datastudy.nl

Monday, June 15, 2026

AI

Generative engine optimization: how to rank inside an LLM

Generative engine optimization (GEO) is the practice of getting your content cited inside AI answers from ChatGPT, Perplexity and Google's AI Overviews. Here is what earns a citation and what to change on your site.

Bar chart of content tactics and their effect on source visibility in AI answers: citing sources +40 percent, adding quotations +38 percent, adding statistics +37 percent, fluent writing +15 percent, keyword stuffing minus 3 percent.
Citing sources lifts source visibility by about 40 percent, quotations 38 percent and statistics 37 percent, while keyword stuffing slightly hurts. Illustrative, after the Princeton GEO study. Data Today.

For fifteen years the deal was simple: write a page, rank on Google, get a click. That deal is quietly being renegotiated. More and more of your future readers will never see your page at all. They will ask ChatGPT, Perplexity, or Google's AI Overviews a question, read a synthesized paragraph, and move on. The question that decides whether your business exists in that paragraph is no longer "do I rank?" but "do I get cited?" That shift has a name, and it is the new skill worth learning: generative engine optimization.

Here is the baseline you need before any of the tactics make sense. A search engine returns a ranked list of links and sends you traffic. A generative engine, which is what powers ChatGPT, Perplexity, Gemini, Claude, and the AI block now sitting above Google's blue links, does something different. It retrieves a handful of sources, reads them, and writes a single answer that blends them together, usually with a few citations. You are no longer competing for the top of a list. You are competing to be one of the three or four sources the model decides to quote. Generative engine optimization, or GEO, is the practice of shaping your content so that happens. Some people call it answer engine optimization or LLM SEO; the acronyms differ, the job is the same.

This matters now because the traffic is already moving. Google rolled AI Overviews out to over a billion users through 2024 and 2025, and study after study shows those overviews push the classic ten blue links down the page and cut click-through to the sources beneath them. Gartner has projected that traditional search engine volume will drop by around 25 percent by 2026 as users shift to AI assistants. If a quarter of your search funnel is about to be intermediated by a model, the model is now your most important reader.

How does an LLM actually decide what to cite?

To optimize for the machine you have to know how the machine picks. Most production AI search systems are not answering from memory. They use retrieval-augmented generation: when you ask a question, the system runs a live search, pulls back a set of candidate pages, and feeds the relevant chunks into the model along with your question. The model then writes an answer grounded in those chunks and cites the ones it leaned on.

That pipeline has two gates, and you have to clear both.

  • Retrieval. Your page has to be in the candidate set the system pulls back. This is where classic SEO still earns its keep: you need to be crawlable, indexed, fast, and topically relevant to the query. If you do not surface in the underlying search, you are invisible before the model even starts writing.
  • Selection and synthesis. Once your chunk is in the context window, the model has to find it clear, specific, and quotable enough to actually use. This is the genuinely new gate, and it rewards very different things than a backlink-driven ranking did.

The practical consequence: GEO is not a replacement for SEO, it is a second layer stacked on top. Lose the first gate and the second never opens. Most teams that "do GEO" and see nothing are quietly failing at retrieval, not selection.

What content actually wins the citation?

This is where it stops being guesswork. A team at Princeton ran the first real study on it, published as the GEO paper by Pranjal Aggarwal and colleagues, and built a benchmark of thousands of real queries to test which content changes move a source's visibility inside a generated answer. The headline: the right edits can lift a source's visibility by up to 40 percent, and the winning tactics are almost the opposite of old-school SEO tricks.

Bar chart of GEO tactics and their effect on source visibility in AI answers: citing sources plus 40 percent, adding quotations plus 38 percent, adding statistics plus 37 percent, clear fluent writing plus 15 percent, keyword stuffing minus 3 percent.
Effect of content tactics on source visibility in generated answers: citing sources about 40 percent, quotations 38 percent, statistics 37 percent, fluent writing 15 percent, keyword stuffing minus 3 percent. Illustrative, after the Princeton GEO study (Aggarwal et al.).

The pattern is consistent and a little humbling for anyone who spent a decade chasing keyword density. The things that earned citations were signals of credibility, not signals of relevance:

  • Cite your own sources. Pages that linked out to authoritative references were themselves more likely to be cited. Models treat well-sourced text as more trustworthy, and so should you.
  • Add direct quotations. A relevant quote from an expert or a primary document gives the model a clean, attributable unit to lift into its answer.
  • Add specific statistics. "Search volume may fall 25 percent by 2026" is far more quotable than "search is changing a lot." Numbers are answer-shaped.
  • Write clearly. Fluency and plain structure helped. Models parse and reuse clean prose more readily than dense marketing copy.
  • Keyword stuffing did nothing, or backfired. The single most abused SEO tactic of the last decade was among the worst performers. Repeating the query phrase does not make a model trust you.

There is an obvious through-line: write like a careful journalist or analyst, not like a page trying to game a ranker. The model is, in effect, a very literal editor who only quotes claims that are specific, sourced, and easy to lift.

What should I actually change on my site?

Tactics are only useful if they become a checklist. Here is the concrete work, roughly in order of payoff for a builder or operator.

Layer What to do Why it helps
Crawl access Allow the AI crawlers (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended) in robots.txt If you block them, you opt out of being cited at all
Structure Lead each page with a one-sentence direct answer, then expand Models lift the first clean definition they find
Headings Phrase H2s as the real questions people ask Matches the query, gives the model a labelled chunk
Evidence Put exact numbers, dates, and named sources in the prose Specific, attributable claims get quoted
Markup Add schema.org structured data and clean semantic HTML Makes your facts machine-readable, not just human-readable
Freshness Show and update publish dates Models and their retrievers favor current sources

A few of these deserve a sharper point. The robots.txt decision is binary and a lot of teams get it wrong by accident: a blanket block of unknown bots, or a marketing team that fears "AI scraping," can quietly remove you from every answer engine at once. Decide it on purpose. If you want the traffic and brand presence, allow the retrieval crawlers explicitly. This site, for example, allows GPTBot, ClaudeBot and PerplexityBot by name and publishes a plain-text llms.txt index of its articles, which is an emerging convention for handing models a clean map of your content.

The structural rule pays off fastest for the least effort. Front-load the answer. If someone asks "what is generative engine optimization," the paragraph that wins is the one that opens by defining it in a single sentence, not the one that warms up for three paragraphs first. Write the answer, then earn the scroll. It is the same instinct as a good featured-snippet page, pushed one step further.

Can you measure any of this, or is it vibes?

Mostly it is still vibes, and you should be honest about that. The clean dashboards that made SEO a discipline do not exist yet for GEO. You cannot see your "ranking" inside ChatGPT, because there is no ranking, and the same prompt can yield a different answer and different citations on two consecutive runs. That non-determinism is the core measurement problem.

What you can do today:

  • Track referral traffic from AI surfaces. Perplexity, ChatGPT and Google AI surfaces show up in your analytics referrers. The volume is small but growing, and the trend line matters more than the absolute number.
  • Run a prompt panel. Pick the twenty questions a customer would actually ask, run them across ChatGPT, Perplexity and Gemini on a schedule, and log whether you appear and who outranks you. It is manual, it is noisy, and it is still the best signal available.
  • Watch for the tools to arrive. A wave of "AI visibility" trackers launched through 2025 and 2026. Treat their numbers as directional, not gospel, until the category matures.

The honest read: do not promise your boss a GEO dashboard with the precision of a Search Console report. The mechanism is real and the tactics work, but the measurement layer is years behind the SEO one.

What is the smart bet from here?

If you build or run anything that depends on being found, here is what the shift means for you in plain terms. Your homepage is no longer the front door; a sentence the model quotes is. Your moat is not keyword coverage; it is being the most specific, best-sourced, most quotable source on the questions you own. And the work compounds, because the same edits that make a page quotable, clear answers, real numbers, honest citations, also make it better for the humans who do still click.

The trap to avoid is treating GEO as a fresh set of tricks to spam. The Princeton result is blunt about this: the manipulative tactics that worked on dumb rankers do not work on models, and keyword stuffing actively cost visibility. The durable strategy is almost boring. Be genuinely useful, be specific, cite your work, and make the machine's job easy.

Search did not die. It got an editor in the middle who reads everything, trusts almost nothing, and quotes only the sources that did their homework. Write for that editor.

Sources