by datastudy.nl

Friday, June 5, 2026

Data

What an AI prompt really costs in energy, water and CO2

A typical AI text prompt uses roughly 0.24 to 0.34 watt-hours and a fraction of a milliliter of water, not a bottle per email. Training is huge but rare, and inference is what adds up.

Donut chart of where AI compute energy goes, with inference at about 85 percent in red and training at about 15 percent in blue.
Where AI compute energy goes: inference is roughly 85 percent and training roughly 15 percent, because models are served billions of times but trained once. Sources: Amazon Web Services and Google. Data Today benchmark.

Dataset: the per-model energy figures here are measured and published openly on the University of Michigan ML.Energy leaderboard, which anyone can download and re-run.

Run a small open model on a laptop, generate a full email, and the battery barely moves. So when a headline says that same email "drinks a bottle of water," the skepticism is earned. The measured numbers mostly back it up.

The distinction the scary headlines usually blur is the one that matters. Training is the one-time, brutally expensive process of building a model. Inference is what happens every time someone uses it: a prompt goes in, the model answers. The two have completely different cost profiles, and almost every viral "AI is boiling the oceans" statistic quietly mixes them up. The number worth remembering is that a median Gemini text prompt now uses about 0.24 watt-hours of electricity and 0.26 milliliters of water, roughly five drops.

How much does one text prompt actually cost?

Three independent sources now put a single text prompt in the same small range.

Google published a measurement methodology in August 2025 and put the median Gemini text prompt at 0.24 watt-hours of energy, 0.03 grams of CO2 equivalent, and 0.26 milliliters of water. That figure is the comprehensive one: it includes idle backup machines, host CPUs and RAM, and data center cooling overhead, not just the chip doing the math. Stripped down to active-chip-only, the way many alarming estimates are calculated, Google's number falls to 0.10 watt-hours. The company framed the full figure as the energy of watching television for under nine seconds.

OpenAI's Sam Altman, writing in mid-2025, said the average ChatGPT query uses about 0.34 watt-hours and roughly 0.000085 gallons of water, which is about 0.32 milliliters, or one fifteenth of a teaspoon. The research group Epoch AI, working bottom-up from chip specifications, landed at about 0.3 watt-hours for a typical GPT-4o query, explicitly ten times lower than the widely repeated 3 watt-hour estimate first floated by Alex de Vries in 2023.

Bar chart of energy per AI request: small text model 0.03 watt-hours, Gemini prompt 0.24, ChatGPT query 0.34, one image 0.63, and a large text model 1.86 watt-hours.
Energy per AI request. A small text model is 0.03 Wh, a Gemini prompt 0.24 Wh, a ChatGPT query 0.34 Wh, one image about 0.63 Wh, and a 405B text model 1.86 Wh. A five-second AI video is left off the chart because, at about 944 Wh, it is hundreds of times taller than every bar shown. Sources: Google, OpenAI, MIT Technology Review and ML.Energy.

Model size drives most of the spread. Working with the University of Michigan's ML.Energy team, MIT Technology Review measured Meta's open Llama models on a range of prompts. The small Llama 3.1 8B needed about 114 joules per answer, the energy of running a microwave for a tenth of a second. The fifty-times-larger Llama 3.1 405B needed about 6,706 joules, or roughly eight seconds of microwave time. Same task, very different bill, depending on which model serves it.

So where did "a bottle of water per email" come from?

That line is not invented, but it is a worst case wearing a headline. It traces to a September 2024 Washington Post collaboration with the University of California, Riverside, which estimated that GPT-4 writing one 100-word email could consume up to 519 milliliters of water, about a 16-ounce bottle, in a hot-climate Texas data center, and as little as 235 milliliters elsewhere.

Bar chart of water per text prompt in milliliters: Google Gemini measured at 0.26 mL, ChatGPT at 0.32 mL, and the viral bottle-per-email claim at 519 mL.
Water per text prompt. Google's measured Gemini figure is 0.26 mL and OpenAI's ChatGPT figure is 0.32 mL, against the viral 519 mL "bottle per email" worst case. Sources: Google, OpenAI, Washington Post with UC Riverside.

Next to Google's measured 0.26 milliliters, the gap is roughly two thousand times. Both figures can be defended, and the difference is instructive. The bottle figure stacks several pessimistic choices at once: a long answer, an older and larger model, a thirsty region, and, importantly, both kinds of water. Data centers consume water two ways: directly, by evaporating it in cooling towers, and indirectly, because the power plant feeding them also consumes water to make electricity. The team behind the bottle estimate spelled out that off-grid water cost in its paper Making AI Less Thirsty. Add a region where the grid and the cooling are both water-hungry and the worst case balloons. Google's number is a fleet-wide median with best-in-class cooling efficiency behind it. The truth for a normal prompt sits far closer to a few drops than to a bottle.

Does generating an image or a video cost more?

Yes, and this is where the laptop intuition starts to bend. Image models use a different, diffusion-based architecture, and the energy does not depend on the prompt's content. A skier on sand dunes and an astronaut farming on Mars cost the same; what matters is resolution and the number of diffusion steps.

MIT Technology Review measured a standard 1024-pixel image from Stable Diffusion 3 Medium at about 2,282 joules, or 0.63 watt-hours, and roughly double that at higher quality. That is more than a small text answer but still less than the largest text model. Hugging Face's wider study, Power Hungry Processing, found image generation averaging 2.9 watt-hours per image across many models, with the most carbon-intensive emitting about 1.6 grams of CO2 each, the equivalent of driving 4.1 miles for every 1,000 images.

Video is the real outlier. A single five-second clip from the open CogVideoX model took about 3.4 million joules, near 944 watt-hours, more than 700 times a high-quality image and enough to run a microwave for over an hour. The cost ladder is consistent: classifying text is nearly free, generating text costs a little, generating pixels costs more, and generating moving pixels costs a lot.

AI action Energy Everyday equivalent
Small text model answer ~0.03 Wh Microwave for 0.1 second
Gemini text prompt 0.24 Wh Watching TV for under 9 seconds
ChatGPT query 0.34 Wh An oven running for about 1 second
One standard image ~0.63 Wh Microwave for about 3 seconds
Large text model answer ~1.86 Wh Microwave for 8 seconds
Five-second AI video ~944 Wh Microwave for over an hour

Is training really the expensive part?

The common intuition is half right. Training is genuinely enormous. Training GPT-3 is estimated to have used about 1,287 megawatt-hours of electricity, emitted roughly 552 tons of CO2, and evaporated around 700,000 liters of freshwater, according to the UC Riverside team that coined the "thirsty AI" framing and the Google and Berkeley analysis behind those energy numbers. GPT-4 reportedly cost over $100 million and burned through about 50 gigawatt-hours, enough to power San Francisco for three days.

But the next step breaks the "so inference must be near zero" logic. That training cost is paid once, then smeared across every prompt the model ever answers. Spread 1,287 megawatt-hours over the billions of queries a popular model serves, and the training share of any single prompt rounds to nothing. Epoch AI notes that a frontier training run draws about 20 to 25 megawatts for three months, comparable to the continuous power that ChatGPT inference already pulls every single day, except inference never stops.

Donut chart showing where AI compute energy goes, with inference at 85 percent and training at 15 percent.
Industry estimates put inference at roughly 80 to 90 percent of AI compute and training at the remainder. Amazon Web Services and Google have both reported inference as the larger share. Source: AWS and Google.

So the real answer is neither. Per use, inference is tiny and training is invisible. In aggregate, inference is the bigger number, somewhere around 80 to 90 percent of AI compute by industry estimates from AWS, precisely because it happens billions of times a day while training happens once. The cost is not hiding in a single expensive step. It is the multiplication. This is the same demand curve driving the gigawatt buildout behind AI infrastructure: not one giant training run, but relentless, always-on serving.

What about running models locally?

A local model is the cleanest case to reason about, because three things are true at once. The per-query energy really is tiny: a small model generating an email sits at the 0.03 watt-hour end of the chart. The direct water use really is near zero, because a laptop has a fan, not an evaporative cooling tower, and consumes no water on site; the only water involved is whatever the local power plant used to make the few watt-hours pulled from the wall. And local is not automatically greener per token: a laptop runs one request at a time on consumer silicon, while a hosted data center batches thousands of requests across purpose-built accelerators at a fleet power-usage efficiency near 1.09, meaning almost all the power reaches the chips.

  • Per-query energy is tiny. A small local model answering an email is the cheapest bar on the chart.
  • On-site water is effectively zero. No cooling tower means no evaporation; only the grid carries embedded water.
  • Hosted is often more efficient per token. Batching and purpose-built accelerators beat a single laptop running unbatched.

The catch is that it is not the same model. A laptop runs an 8-billion-parameter model; ChatGPT may route a query to something a hundred times larger. The honest read: hosted AI is not secretly wasteful per prompt, and local AI is not magically free. Local is cheaper mostly because the model is smaller and the work is simpler, not because the cloud hides a swimming pool behind every answer.

What should builders and operators take from this?

For anyone shipping AI features, the footprint follows the same levers as the bill.

  • Right-size the model. A 50x-bigger model can cost 50x the energy for a marginally better answer. Matching model to task cuts cost and carbon together.
  • Mind reasoning and long context. Reasoning models can use up to 43 times more energy on a simple problem, and a 100,000-token input can push a single query toward 40 watt-hours. A chain-of-thought model pointed at a task a classifier would finish is pure waste.
  • Location decides the carbon, not just the kilowatt-hours. The same 2.9 kilowatt-hours of AI work emits about 650 grams of CO2 on California's grid and over 1,150 grams in coal-heavy West Virginia.
  • Watch the aggregate, not the prompt. A single call is harmless. Ten million users calling it hourly inside an always-on agent is a power plant.

The individual-guilt framing was always the wrong lens. One prompt costs a few drops of water and less electricity than a household fridge light burning for a minute. The thing actually reshaping grids is not a single email. It is a billion emails, plus images, plus video, plus agents that never sleep, all multiplied at a scale no one is fully disclosing yet.

Sources