theLLMs

Cache

Concepts, ideas, and knowledge snippets worth keeping loaded

This is the playful replacement for "guides": not a dusty manual shelf, more like working memory for AI decisions. Tokens, context windows, evals, model trade-offs, RAG, agents, pricing, benchmarks, and all the tiny concepts that stop projects turning into expensive fog.

28published Cache pages
48Cache briefs
25how-to briefs
21news/context briefs

Published now

Live Cache in this prototype

lm-eval-harness explained for non-researchers

A plain-English guide to what lm-eval-harness does, why teams use it, and why a benchmark runner is not the same thing as proof of real-world usefulness.

Published prototype · Cache

Briefed pipeline

Cache queued for the 100-page programme

Cache #1 · LLM basics for operators

What is a token, and why does it affect AI cost?

Understand tokenisation and pricing basics before using an API.

Cache #2 · LLM basics for operators

Context windows explained: why bigger is not always better

Decide whether long-context models solve a product problem.

Cache #3 · LLM basics for operators

Temperature, top-p and deterministic outputs: what the settings actually do

Configure generation settings for consistency or creativity.

Cache #4 · LLM basics for operators

Embeddings explained for business search and RAG

Understand embeddings before buying a vector database.

Cache #5 · LLM basics for operators

Latency in LLM apps: first token, total time and user experience

Diagnose why an AI feature feels slow.

Cache #6 · LLM basics for operators

Rate limits explained: requests, tokens, tiers and hidden launch risks

Plan capacity before launching an AI feature.

Cache #7 · LLM basics for operators

Prompt length, output length and why AI bills surprise teams

Explain unexpected token usage and cost spikes.

Cache #8 · LLM basics for operators

System prompts, developer prompts and user prompts: who controls what?

Understand prompt hierarchy and instruction conflicts.

Cache #9 · LLM basics for operators

JSON mode and structured outputs: what reliability does and does not mean

Get valid machine-readable outputs from LLMs.

Cache #10 · LLM basics for operators

Multimodal models explained: text, images, audio and video in practical products

Assess whether multimodal AI fits a workflow.

Cache #11 · LLM basics for operators

Inference vs training vs fine-tuning: three terms operators confuse

Understand what kind of AI work a project actually needs.

Cache #13 · LLM basics for operators

Model parameters and sizes: why 7B, 70B and MoE labels can mislead

Interpret model-size claims and open-model labels.

Cache #16 · Evaluation and harnesses

lm-eval-harness explained for non-researchers

Learn what lm-eval-harness can and cannot test.

Cache #17 · Evaluation and harnesses

HELM-style evaluation: why transparency matters as much as scores

Understand holistic model evaluation.

Cache #18 · Evaluation and harnesses

Promptfoo vs lm-eval-harness: when each is useful

Choose an evaluation tool for prompts or models.

Cache #19 · Evaluation and harnesses

RAG evaluation: checking retrieval before blaming the model

Diagnose poor RAG answers.

Cache #20 · Evaluation and harnesses

Synthetic eval datasets: useful shortcut or false confidence?

Decide whether to generate test cases with LLMs.

Cache #22 · Evaluation and harnesses

Human evaluation for LLMs: rubrics that editors and SMEs can actually use

Design a review process for generated outputs.

Cache #23 · Evaluation and harnesses

LLM-as-a-judge: when automated grading helps and when it lies

Use another model to grade model outputs.

Cache #24 · Evaluation and harnesses

Coding benchmarks explained: HumanEval, MBPP, SWE-bench and real developer work

Interpret coding model claims.

Cache #25 · Evaluation and harnesses

Function-calling benchmarks: why tool-use scores do not guarantee agents work

Understand tool-use and agent benchmark claims.

Cache #26 · Evaluation and harnesses

Long-context benchmarks: needle tests, document QA and real recall

Evaluate long-context model claims.

Cache #27 · Evaluation and harnesses

Contamination and leakage: why benchmark scores can be too good

Understand benchmark trust issues.

Cache #29 · Evaluation and harnesses

Benchmark leaderboards for busy buyers: Chatbot Arena, LiveBench and what to ignore

Use public leaderboards without overtrusting them.

Cache #30 · Evaluation and harnesses

Creating a model scorecard for your own workload

Compare models for a specific business use case.

Cache #31 · Cost and economics

API model pricing: input, output, cache and batch costs

Understand how AI API bills are calculated.

Cache #32 · Cost and economics

Prompt caching explained: when repeated context becomes cheaper

Reduce repeated prompt costs.

Cache #33 · Cost and economics

Batch APIs for LLMs: cheaper, slower and often underused

Process non-urgent AI jobs cheaply.

Cache #34 · Cost and economics

Hosted API vs self-hosted open model: the real cost comparison

Decide whether local/self-hosted inference saves money.

Cache #35 · Cost and economics

GPU rental for LLM inference: what an operator needs to know

Understand GPU hosting options for open models.

Cache #36 · Cost and economics

Fine-tuning economics: when training a custom model pays back

Compare fine-tuning with prompting/RAG from a cost perspective.

Cache #37 · Cost and economics

RAG costs: vector database, embeddings, reranking and generation

Estimate total cost of a document QA system.

Cache #38 · Cost and economics

AI feature unit economics: cost per user, task and successful answer

Model AI feature profitability.

Cache #39 · Cost and economics

Output tokens are expensive: designing shorter AI answers without hurting usefulness

Cut costs from verbose model outputs.

Cache #40 · Cost and economics

Model routing: using cheap models first without breaking quality

Save money by routing tasks across models.

Cache #41 · Cost and economics

Caching AI answers: when it is safe, risky or pointless

Reduce repeated AI calls.

Cache #42 · Cost and economics

The hidden cost of retries, fallbacks and validation loops

Understand why real costs exceed estimates.

Cache #43 · Cost and economics

LLM observability cost: logs, traces and evaluation storage

Budget for monitoring AI systems.

Cache #44 · Cost and economics

A simple LLM cost calculator editors can maintain

Estimate AI costs from token volumes.

Cache #45 · Model/provider landscape

Open weights vs hosted APIs: practical trade-offs

Choose between open models and managed APIs.

Cache #56 · Model/provider landscape

Cloud AI platforms vs direct model APIs: Bedrock, Vertex and Azure trade-offs

Choose between cloud marketplace AI and direct provider integration.

Cache #60 · Reliability, safety and security

Prompt injection explained for business users

Understand why retrieved text or user input can hijack an AI app.

Cache #62 · Reliability, safety and security

Jailbreaks vs product safety: what operators can realistically control

Understand jailbreak risk without panic.

Cache #63 · Reliability, safety and security

Tool-use safety: stopping agents from taking dangerous actions

Design guardrails for AI agents with tools.

Cache #65 · Reliability, safety and security

PII handling for LLM apps: minimisation before redaction

Handle personal data safely in AI features.

Cache #66 · Reliability, safety and security

AI output monitoring: what to log, sample and review

Monitor production LLM quality and incidents.

Cache #67 · Reliability, safety and security

Citation quality in AI answers: source-grounded does not mean source-faithful

Improve citations in AI answers.

Cache #69 · Reliability, safety and security

AI incident response: what to do when a model gives harmful or wrong advice

Prepare for AI-related production incidents.