theLLMs

100-page programme

All 100 briefs, visible and accountable

The full 100-page programme is complete. All 100 briefs have been drafted, editorially reviewed, and integrated into the live prototype. This page is the accountable record of every brief in the programme — use it to verify coverage, not as a working queue.

100total briefs
48Cache
25Run
21Diff

48 briefs

Cache

Concepts, ideas, explainers, comparisons and knowledge snippets.

#1 · LLM basics for operators

What is a token, and why does it affect AI cost?

Understand tokenisation and pricing basics before using an API.

Target reader: Founder, product owner or non-specialist technical lead.

#2 · LLM basics for operators

Context windows explained: why bigger is not always better

Decide whether long-context models solve a product problem.

Target reader: Product manager or builder planning document workflows.

#3 · LLM basics for operators

Temperature, top-p and deterministic outputs: what the settings actually do

Configure generation settings for consistency or creativity.

Target reader: Developer integrating text generation.

#4 · LLM basics for operators

Embeddings explained for business search and RAG

Understand embeddings before buying a vector database.

Target reader: Small-business operator, data lead or web developer.

#5 · LLM basics for operators

Latency in LLM apps: first token, total time and user experience

Diagnose why an AI feature feels slow.

Target reader: SaaS founder, product engineer or UX lead.

#6 · LLM basics for operators

Rate limits explained: requests, tokens, tiers and hidden launch risks

Plan capacity before launching an AI feature.

Target reader: Startup operator or engineering manager.

#7 · LLM basics for operators

Prompt length, output length and why AI bills surprise teams

Explain unexpected token usage and cost spikes.

Target reader: Product owner or finance-aware technical lead.

#8 · LLM basics for operators

System prompts, developer prompts and user prompts: who controls what?

Understand prompt hierarchy and instruction conflicts.

Target reader: Developer or AI product designer.

#9 · LLM basics for operators

JSON mode and structured outputs: what reliability does and does not mean

Get valid machine-readable outputs from LLMs.

Target reader: Application developer.

#10 · LLM basics for operators

Multimodal models explained: text, images, audio and video in practical products

Assess whether multimodal AI fits a workflow.

Target reader: Product manager or automation consultant.

#11 · LLM basics for operators

Inference vs training vs fine-tuning: three terms operators confuse

Understand what kind of AI work a project actually needs.

Target reader: Buyer, founder or non-ML technical stakeholder.

#13 · LLM basics for operators

Model parameters and sizes: why 7B, 70B and MoE labels can mislead

Interpret model-size claims and open-model labels.

Target reader: Technical buyer or local-LLM hobbyist moving toward production.

#16 · Evaluation and harnesses

lm-eval-harness explained for non-researchers

Learn what lm-eval-harness can and cannot test.

Target reader: Developer, analyst or editor planning reproducible tests.

#17 · Evaluation and harnesses

HELM-style evaluation: why transparency matters as much as scores

Understand holistic model evaluation.

Target reader: Policy-aware technologist or enterprise buyer.

#18 · Evaluation and harnesses

Promptfoo vs lm-eval-harness: when each is useful

Choose an evaluation tool for prompts or models.

Target reader: Developer or QA lead.

#19 · Evaluation and harnesses

RAG evaluation: checking retrieval before blaming the model

Diagnose poor RAG answers.

Target reader: Developer building document QA.

#20 · Evaluation and harnesses

Synthetic eval datasets: useful shortcut or false confidence?

Decide whether to generate test cases with LLMs.

Target reader: AI engineer, QA manager or product owner.

#22 · Evaluation and harnesses

Human evaluation for LLMs: rubrics that editors and SMEs can actually use

Design a review process for generated outputs.

Target reader: Editorial lead, domain expert or product manager.

#23 · Evaluation and harnesses

LLM-as-a-judge: when automated grading helps and when it lies

Use another model to grade model outputs.

Target reader: AI engineer or evaluator.

#24 · Evaluation and harnesses

Coding benchmarks explained: HumanEval, MBPP, SWE-bench and real developer work

Interpret coding model claims.

Target reader: Engineering manager or developer-tool buyer.

#25 · Evaluation and harnesses

Function-calling benchmarks: why tool-use scores do not guarantee agents work

Understand tool-use and agent benchmark claims.

Target reader: Developer or product lead.

#26 · Evaluation and harnesses

Long-context benchmarks: needle tests, document QA and real recall

Evaluate long-context model claims.

Target reader: Legal-tech, research or document-workflow builder.

#27 · Evaluation and harnesses

Contamination and leakage: why benchmark scores can be too good

Understand benchmark trust issues.

Target reader: Editor, analyst or technical buyer.

#29 · Evaluation and harnesses

Benchmark leaderboards for busy buyers: Chatbot Arena, LiveBench and what to ignore

Use public leaderboards without overtrusting them.

Target reader: Buyer, founder or journalist.

#30 · Evaluation and harnesses

Creating a model scorecard for your own workload

Compare models for a specific business use case.

Target reader: Founder, product manager or AI engineer.

#31 · Cost and economics

API model pricing: input, output, cache and batch costs

Understand how AI API bills are calculated.

Target reader: Founder, finance lead or developer.

#32 · Cost and economics

Prompt caching explained: when repeated context becomes cheaper

Reduce repeated prompt costs.

Target reader: Developer with high-volume API usage.

#33 · Cost and economics

Batch APIs for LLMs: cheaper, slower and often underused

Process non-urgent AI jobs cheaply.

Target reader: Data engineer, content ops lead or SaaS operator.

#34 · Cost and economics

Hosted API vs self-hosted open model: the real cost comparison

Decide whether local/self-hosted inference saves money.

Target reader: CTO, founder or infrastructure lead.

#35 · Cost and economics

GPU rental for LLM inference: what an operator needs to know

Understand GPU hosting options for open models.

Target reader: Developer or startup operator.

#36 · Cost and economics

Fine-tuning economics: when training a custom model pays back

Compare fine-tuning with prompting/RAG from a cost perspective.

Target reader: Product owner or ML lead.

#37 · Cost and economics

RAG costs: vector database, embeddings, reranking and generation

Estimate total cost of a document QA system.

Target reader: Builder or technical buyer.

#38 · Cost and economics

AI feature unit economics: cost per user, task and successful answer

Model AI feature profitability.

Target reader: SaaS founder, product manager or finance lead.

#39 · Cost and economics

Output tokens are expensive: designing shorter AI answers without hurting usefulness

Cut costs from verbose model outputs.

Target reader: Product designer or developer.

#40 · Cost and economics

Model routing: using cheap models first without breaking quality

Save money by routing tasks across models.

Target reader: AI platform engineer or SaaS CTO.

#41 · Cost and economics

Caching AI answers: when it is safe, risky or pointless

Reduce repeated AI calls.

Target reader: Developer building support or content systems.

#42 · Cost and economics

The hidden cost of retries, fallbacks and validation loops

Understand why real costs exceed estimates.

Target reader: Engineering manager or founder.

#43 · Cost and economics

LLM observability cost: logs, traces and evaluation storage

Budget for monitoring AI systems.

Target reader: Engineering lead or platform owner.

#44 · Cost and economics

A simple LLM cost calculator editors can maintain

Estimate AI costs from token volumes.

Target reader: Site editor, founder or reader needing a practical tool.

#45 · Model/provider landscape

Open weights vs hosted APIs: practical trade-offs

Choose between open models and managed APIs.

Target reader: Founder, CTO or technical buyer.

#56 · Model/provider landscape

Cloud AI platforms vs direct model APIs: Bedrock, Vertex and Azure trade-offs

Choose between cloud marketplace AI and direct provider integration.

Target reader: Enterprise architect or CTO.

#60 · Reliability, safety and security

Prompt injection explained for business users

Understand why retrieved text or user input can hijack an AI app.

Target reader: Business owner, developer or security stakeholder.

#62 · Reliability, safety and security

Jailbreaks vs product safety: what operators can realistically control

Understand jailbreak risk without panic.

Target reader: AI product owner or safety reviewer.

#63 · Reliability, safety and security

Tool-use safety: stopping agents from taking dangerous actions

Design guardrails for AI agents with tools.

Target reader: Developer or operations lead.

#65 · Reliability, safety and security

PII handling for LLM apps: minimisation before redaction

Handle personal data safely in AI features.

Target reader: Developer, privacy lead or founder.

#66 · Reliability, safety and security

AI output monitoring: what to log, sample and review

Monitor production LLM quality and incidents.

Target reader: Engineering manager or AI ops lead.

#67 · Reliability, safety and security

Citation quality in AI answers: source-grounded does not mean source-faithful

Improve citations in AI answers.

Target reader: Editor, RAG builder or knowledge-base owner.

#69 · Reliability, safety and security

AI incident response: what to do when a model gives harmful or wrong advice

Prepare for AI-related production incidents.

Target reader: Product lead, support manager or founder.

25 briefs

Run

Workflows with inputs, outputs, checks, failure modes and a stopping point.

#12 · LLM basics for operators

Chat history is not memory: how LLM apps remember users

Understand AI memory features and their privacy implications.

Target reader: SaaS operator, designer or privacy-conscious user.

#14 · LLM basics for operators

AI agents vs workflows: a plain-English difference for teams

Decide whether a task needs an “agent” or deterministic automation.

Target reader: Operations lead, founder or automation builder.

#15 · Evaluation and harnesses

How LLM benchmarks work, and what they miss

Understand benchmark claims in model launches.

Target reader: Technical reader, buyer or editor.

#21 · Evaluation and harnesses

Golden datasets for LLM products: how small regression sets prevent regressions

Build a practical evaluation set for an AI feature.

Target reader: Product engineer or QA lead.

#28 · Evaluation and harnesses

Eval CI for AI apps: testing prompts before every release

Put AI regression tests into a software pipeline.

Target reader: Engineering team lead.

#47 · Model/provider landscape

The model release treadmill: how to avoid rebuilding every month

Reduce disruption from frequent model releases/deprecations.

Target reader: Product and engineering leads.

#59 · Reliability, safety and security

Hallucination testing: how to build a small regression set

Reduce factual errors in LLM outputs.

Target reader: Product owner, editor or developer.

#64 · Reliability, safety and security

Refusals and over-refusals: testing whether safety blocks useful work

Diagnose unwanted model refusals.

Target reader: Product manager, support lead or developer.

#68 · Reliability, safety and security

Eval gaming: when models optimise for the test rather than the task

Understand why high benchmark scores may not translate.

Target reader: Buyer, editor or evaluator.

#74 · Reliability, safety and security

Red teaming an LLM feature: a practical first-week checklist

Test an AI feature before launch.

Target reader: Product team, QA lead or security reviewer.

#75 · Practical implementation

Fine-tuning vs prompting vs RAG: decision checklist

Choose an adaptation strategy for an AI product.

Target reader: Founder, PM or developer.

#76 · Practical implementation

Function calling and tool use: where agents actually fail

Build reliable tool-using AI workflows.

Target reader: Developer or automation lead.

#77 · Practical implementation

MCP explained: tools, resources, prompts and the current hype gap

Understand Model Context Protocol and when it helps.

Target reader: Developer, technical buyer or editor.

#78 · Practical implementation

Vector databases: when semantic search is enough and when it is not

Decide whether to add a vector database.

Target reader: Developer or technical buyer.

#79 · Practical implementation

Chunking documents for RAG: size, overlap and metadata choices

Improve retrieval quality in document AI.

Target reader: RAG builder or knowledge-base owner.

#80 · Practical implementation

Rerankers explained: the quiet quality layer in RAG systems

Understand reranking after vector search.

Target reader: Developer or AI product owner.

#81 · Practical implementation

Building a minimum viable RAG system without overengineering

Plan a simple document-QA prototype.

Target reader: Startup developer or internal tools builder.

#82 · Practical implementation

AI coding agents: what to measure before trusting them

Evaluate coding agents for real engineering work.

Target reader: Engineering manager or developer.

#83 · Practical implementation

Schema-first AI extraction: making LLMs useful for messy documents

Extract structured data from unstructured text.

Target reader: Developer, ops lead or analyst.

#84 · Practical implementation

LLM observability basics: traces, prompts, evals and feedback loops

Monitor an LLM app in production.

Target reader: Engineering lead or AI platform owner.

#85 · Practical implementation

Fallback design: what happens when the AI call fails?

Design resilient AI features.

Target reader: Product manager or engineer.

#86 · Practical implementation

Human-in-the-loop AI: approval queues that do not become bottlenecks

Add human review to AI workflows.

Target reader: Operations lead, editor or product manager.

#87 · Practical implementation

Prompt versioning: treating prompts like production code

Manage changes to prompts in teams.

Target reader: Developer, QA lead or AI product owner.

#88 · Practical implementation

Building an internal AI policy bot: safe pattern or risky shortcut?

Use AI over internal policies and procedures.

Target reader: HR/ops lead, internal tools developer or compliance stakeholder.

#100 · Industry, regulation and procurement

The evidence-led AI website manifesto: how theLLMs will review claims

Understand the site’s editorial trust model.

Target reader: Reader, editor, source or prospective contributor.

21 briefs

Diff

Dated context pieces: what changed, who should care, and what remains unproved.

#46 · Model/provider landscape

What model cards tell you — and what they do not

Read model cards critically.

Target reader: Editor, buyer or technical practitioner.

#48 · Model/provider landscape

OpenAI, Anthropic, Google and Mistral APIs: what comparison pages should measure

Compare major hosted providers without hype.

Target reader: Buyer or technical evaluator.

#49 · Model/provider landscape

Meta Llama and open model licensing: what builders must check

Understand whether an open model can be used commercially.

Target reader: Startup founder, developer or procurement lead.

#50 · Model/provider landscape

Small language models: when smaller is better

Decide whether a compact model fits a task.

Target reader: Edge-app developer or cost-conscious operator.

#51 · Model/provider landscape

Reasoning models: what “thinking” modes change for cost and latency

Understand reasoning-model trade-offs.

Target reader: Developer, PM or buyer.

#52 · Model/provider landscape

Mixture-of-experts models: why active parameters matter

Interpret MoE claims in model launches.

Target reader: Technical reader or local-model user.

#53 · Model/provider landscape

Local LLM runtimes: Ollama, llama.cpp, vLLM and TGI in plain English

Choose a runtime for running open models.

Target reader: Developer experimenting with local or hosted inference.

#54 · Model/provider landscape

Quantisation explained: why model files have Q4, Q5 and GGUF labels

Understand local model file choices.

Target reader: Local-LLM user or developer.

#55 · Model/provider landscape

Provider data retention policies: what API users should compare

Assess privacy and data-use differences between providers.

Target reader: Enterprise buyer, privacy lead or founder.

#57 · Model/provider landscape

Model gateways and routers: OpenRouter, LiteLLM and build-vs-buy questions

Add provider flexibility without rewriting apps.

Target reader: AI platform engineer or startup CTO.

#58 · Model/provider landscape

Changelog watching for AI teams: deprecations, pricing and model aliases

Keep AI apps stable amid provider changes.

Target reader: Engineering manager or technical editor.

#61 · Reliability, safety and security

Data leakage in LLM apps: logs, prompts, files and vendor retention

Identify places sensitive data can escape.

Target reader: Founder, privacy lead or engineering manager.

#70 · Reliability, safety and security

Guardrails compared: policy prompts, classifiers, validators and permissions

Choose safety controls for an AI app.

Target reader: Developer or product owner.

#89 · Industry, regulation and procurement

Enterprise AI procurement: questions before buying a platform

Evaluate AI vendors and platforms.

Target reader: Procurement lead, CTO, COO or board adviser.

#93 · Industry, regulation and procurement

Copyright and training data: what AI product teams can responsibly say

Understand the controversy around training data and outputs.

Target reader: Founder, editor, marketer or product lead.

#94 · Industry, regulation and procurement

AI energy use: useful facts without moral panic

Understand AI energy and infrastructure claims.

Target reader: General technical reader, editor or buyer.

#95 · Industry, regulation and procurement

Hardware supply and inference economics: why chips shape AI products

Understand why GPUs and accelerators affect AI availability and cost.

Target reader: Business/industry reader or technical buyer.

#96 · Industry, regulation and procurement

AI vendor lock-in: model APIs, embeddings, vector stores and eval data

Reduce switching risk in AI systems.

Target reader: CTO, architect or procurement lead.

#97 · Industry, regulation and procurement

AI SLAs and status pages: what reliability evidence vendors publish

Evaluate provider reliability claims.

Target reader: Enterprise buyer or engineering manager.

#98 · Industry, regulation and procurement

Responsible AI policies that builders can actually operationalise

Turn abstract AI principles into processes.

Target reader: Startup founder, product lead or governance owner.

#99 · Industry, regulation and procurement

AI adoption in small businesses: where LLMs help first

Identify practical AI use cases for small firms.

Target reader: Small-business owner or consultant.