theLLMs

Run

Workflows you can actually test

The Run lane is for doing the work: choosing models, setting up evals, pricing a feature, testing retrieval, running agents, and explaining trade-offs without turning the meeting into acronym soup.

24published guides
25guide briefs
48Cache briefs to link from
21diff/context briefs

Published now

Live guides in this prototype

Editorial rule

Every guide needs a stopping point

Briefed pipeline

Run queued for drafting, edit and promotion

Run #12 · LLM basics for operators

Chat history is not memory: how LLM apps remember users

Understand AI memory features and their privacy implications.

Run #14 · LLM basics for operators

AI agents vs workflows: a plain-English difference for teams

Decide whether a task needs an “agent” or deterministic automation.

Run #15 · Evaluation and harnesses

How LLM benchmarks work, and what they miss

Understand benchmark claims in model launches.

Run #21 · Evaluation and harnesses

Golden datasets for LLM products: how small regression sets prevent regressions

Build a practical evaluation set for an AI feature.

Run #28 · Evaluation and harnesses

Eval CI for AI apps: testing prompts before every release

Put AI regression tests into a software pipeline.

Run #47 · Model/provider landscape

The model release treadmill: how to avoid rebuilding every month

Reduce disruption from frequent model releases/deprecations.

Run #59 · Reliability, safety and security

Hallucination testing: how to build a small regression set

Reduce factual errors in LLM outputs.

Run #64 · Reliability, safety and security

Refusals and over-refusals: testing whether safety blocks useful work

Diagnose unwanted model refusals.

Run #68 · Reliability, safety and security

Eval gaming: when models optimise for the test rather than the task

Understand why high benchmark scores may not translate.

Run #74 · Reliability, safety and security

Red teaming an LLM feature: a practical first-week checklist

Test an AI feature before launch.

Run #75 · Practical implementation

Fine-tuning vs prompting vs RAG: decision checklist

Choose an adaptation strategy for an AI product.

Run #76 · Practical implementation

Function calling and tool use: where agents actually fail

Build reliable tool-using AI workflows.

Run #77 · Practical implementation

MCP explained: tools, resources, prompts and the current hype gap

Understand Model Context Protocol and when it helps.

Run #78 · Practical implementation

Vector databases: when semantic search is enough and when it is not

Decide whether to add a vector database.

Run #79 · Practical implementation

Chunking documents for RAG: size, overlap and metadata choices

Improve retrieval quality in document AI.

Run #80 · Practical implementation

Rerankers explained: the quiet quality layer in RAG systems

Understand reranking after vector search.

Run #81 · Practical implementation

Building a minimum viable RAG system without overengineering

Plan a simple document-QA prototype.

Run #82 · Practical implementation

AI coding agents: what to measure before trusting them

Evaluate coding agents for real engineering work.

Run #83 · Practical implementation

Schema-first AI extraction: making LLMs useful for messy documents

Extract structured data from unstructured text.

Run #84 · Practical implementation

LLM observability basics: traces, prompts, evals and feedback loops

Monitor an LLM app in production.

Run #85 · Practical implementation

Fallback design: what happens when the AI call fails?

Design resilient AI features.

Run #86 · Practical implementation

Human-in-the-loop AI: approval queues that do not become bottlenecks

Add human review to AI workflows.

Run #87 · Practical implementation

Prompt versioning: treating prompts like production code

Manage changes to prompts in teams.

Run #88 · Practical implementation

Building an internal AI policy bot: safe pattern or risky shortcut?

Use AI over internal policies and procedures.

Run #100 · Industry, regulation and procurement

The evidence-led AI website manifesto: how theLLMs will review claims

Understand the site’s editorial trust model.