AI energy use: useful facts without moral panic
AI energy consumption generates strong opinions and weak data. Most claims about “AI using as much energy as a small country” conflate training, inference, datacentre overhead, hardware manufacturing, and future projections — and often skip the comparison to what those workloads replace.
Quick answer
AI energy use is real and growing, but the alarmist framing often obscures useful distinctions. Training a large frontier model consumes roughly 10–50 GWh — comparable to the lifetime emissions of a few hundred cars, not the grid footprint of a small country. Inference energy per query is typically 0.1–3 Wh, orders of magnitude less than training but multiplying rapidly as usage scales. Datacentre energy for AI is a fraction of total datacentre energy (roughly 10–15% in 2025), though that share is growing.
The useful question is not “is AI energy use bad?” but “what is the energy per useful task, and how does it compare to the alternative?”
Training energy: what the numbers actually mean
Training a large language model involves running millions of GPU-hours across thousands of accelerators. Published figures include:
- GPT-4 estimated training: ~50 GWh (OpenAI has not confirmed exact figures).
- Llama 3.1 405B training: ~30 GWh (Meta disclosed).
- Smaller models (7–70B): 0.1–5 GWh depending on scale and training duration.
For context, 50 GWh is roughly the annual electricity use of 5,000 UK households. It is significant but not civilisation-scale. The carbon impact depends entirely on grid carbon intensity at the training location — the same training run could have 10x different emissions depending on whether it runs in a coal-heavy or renewable-heavy grid.
Inference energy: the larger long-term concern
Training is a one-off cost. Inference is ongoing and scales with usage. Each API call to a large model uses roughly:
- Small model (7B, quantised): 0.05–0.2 Wh per query
- Medium model (70B): 0.5–2 Wh per query
- Large frontier model: 1–10 Wh per query
At scale, inference dominates energy use. A popular AI product serving millions of daily queries can consume more energy per month than the training run that created the model. Most provider emissions disclosures (when they exist) confirm this pattern.
The comparison that matters is not “AI uses energy” but “does this AI task use less energy than the human task it replaces?” A summary generated by AI may use 1 Wh. Flight booking research that previously required multiple site visits and phone calls may have used more.
Datacentre energy: putting AI in context
Total datacentre electricity use was roughly 1–1.5% of global electricity in 2025. Within that, AI workloads accounted for roughly 10–15%. The much-publicised “AI datacentre growth” projections are often conflated with general datacentre growth.
Cloud providers are also among the largest corporate buyers of renewable energy. This does not make AI energy-neutral, but it means marginal AI demand often accelerates renewable deployment rather than purely adding fossil-fuel load.
Hardware manufacturing: the overlooked factor
Manufacturing GPUs and AI accelerators is energy-intensive. A single high-end GPU requires roughly 3–5 MWh for fabrication. The embodied energy of the hardware fleet supporting a large model can be comparable to months of inference energy.
This is rarely included in AI energy comparisons, which means reported figures typically understate total lifecycle energy use.
What teams get wrong
-
Comparing total AI energy to energy of a single alternative. AI may replace multiple less-efficient alternatives simultaneously. A chatbot that replaces phone support, email support, and a knowledge base may save energy per resolved issue even if the total is large.
-
Assuming all AI energy is equal. A 0.5 Wh inference call is very different from a 50 GWh training run. Lump-sum numbers hide whether the energy is one-off or recurring.
-
Ignoring efficiency improvements. Hardware and software efficiency for AI inference has been improving roughly 2x per year through better chips, quantisation, pruning, and architecture improvements.
-
Focusing only on direct energy use. The energy to manufacture hardware and cool datacentres can be a substantial fraction of total energy use.
Practical decision check
- For a product decision: compare energy per task (not per model or per query) against the alternative process.
- For a procurement decision: ask providers for per-query energy estimates and datacentre renewable-energy mix.
- For internal use: smaller models and shorter outputs save energy. Batch processing and caching reduce per-task energy significantly.
- For public claims: separate training, inference, manufacturing, and projections. If a number does not say which it covers, treat it as unreliable.
Methodology and sources
Check date: 2026-05-25
What was checked: Published training-energy estimates from Meta (Llama 3), estimated figures from academic papers (Patterson et al. 2021, Luccioni et al. 2024), IEA datacentre energy reports, cloud provider sustainability disclosures, and chip manufacturing energy estimates from semiconductor industry sources.
Assumptions and limits: Training energy figures for proprietary models are estimates, not verified measurements. Inference energy varies drastically by hardware, model size, quantisation, batch size, and request length. Manufacturing energy is approximate and varies by fab, node, and yield.
Source list
- Meta Llama 3 sustainability disclosure — https://ai.meta.com/blog/meta-llama-3-environmental-impact/
- Energy and Policy Considerations for Deep Learning (Patterson et al. 2021) — https://arxiv.org/abs/2104.10350
- Estimating the Carbon Footprint of BLOOM (Luccioni et al. 2024) — https://arxiv.org/abs/2211.02001
- IEA Data Centres and Data Transmission Networks — https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks
- Google environmental report — https://sustainability.google/reports/
Related guides
- Hardware supply and inference economics: why chips shape AI products
- Hosted API vs self-hosted open model: the real cost comparison
- GPU rental for LLM inference: what an operator needs to know
Change Log
- 2026-05-27: Added direct source URLs to all named providers and services; added Change Log section. Content unchanged.