theLLMs Diff

56pages

Published and in review

Diff pages in this prototype

OptiLLM: The Drop-In Proxy That Makes Any LLM Reason Better — No Retraining Required

An open-source, OpenAI API-compatible proxy that wraps LLM calls with 20+ inference-time optimization techniques — chain

Diff · 2026-07-09

GPT-5.6 Public Launch Today: Sol, Terra, Luna Family Revealed

OpenAI releases GPT-5.6 in three tiers — Sol, Terra, Luna — with benchmark scores up to 91.9%, pricing from $1/M tokens,

Diff · 2026-07-09

Anthropic Raises $65 Billion in Series H at $965 Billion Valuation — Largest AI Funding Round, Surpasses OpenAI

On May 28, 2026, Anthropic closed a $65 billion Series H financing round, achieving a $965 billion post-money valuation

Diff · 2026-07-09

Claude Sonnet 5 vs GPT-5.5: Benchmarks, Pricing, and the Mid-Tier Sweep

Claude Sonnet 5 beats GPT-5.5 across every directly comparable benchmark while costing 40–50% less. Here's the full brea

Diff · 2026-07-09

Gemini 3.5 Pro Cleared for July Launch While Fable 5 and GPT-5.6 Remain Restricted

Google's Gemini 3.5 Pro ships freely while competitors face regulatory blocks — the first real-world case of AI regulato

Diff · 2026-07-09

Deploying Distributed AI Inference: Blueprints & Troubleshooting

Seven practical deployment blueprints for distributed AI inference on vLLM, llm-d, and Red Hat OpenShift AI — matched to

Diff · 2026-07-09

Llama 4 vs Qwen 3.5 vs Mistral Large 3: The Best Open-Weight LLM in July 2026

A head-to-head comparison of the three strongest open-weight LLM families — Llama 4 Scout, Qwen 3.5/3.6, and Mistral Lar

Diff · 2026-07-09

NVMe KV Cache Offloading for LLM Inference: Serve 10x More Users on One H100

Turn fast local storage into a third tier alongside GPU memory — serving 10x more concurrent users with 2x throughput an

Diff · 2026-07-08

Mistral AI Unveils New 'Fat' Open-Weight Model Family — June Tease Becomes July Early Access

Mistral AI's new fat open-weight model family expands on Large 3's MoE design, targeting Anthropic's Sonnet 5 with more

Diff · 2026-07-07

Claude Sonnet 5 Is Now the Default for Free and Pro — What the Model Shift Means

Claude Sonnet 5 is Anthropic's default for Free and Pro users — near-Opus performance, agentic free-tier access, and a n

Diff · 2026-07-07

Claude Sonnet 5 vs Opus 4.8: Agentic Pricing and Cost-Performance Tradeoffs

Claude Sonnet 5 delivers 80–90% of Opus 4.8's agentic coding quality at 20–40% of the cost. Here's how to choose, combin

Diff · 2026-07-07

OpenAI Retires GPT-4.5 and o3: What Developers Need to Know

OpenAI is retiring GPT-4.5 and o3 from ChatGPT and the API, accelerating its model lifecycle from 18 months to 6. Here's

Diff · 2026-07-07

Claude Sonnet 5 Tokenizer Benchmark: Token Count, Cost, and Performance Impact

Sonnet 5's new tokenizer produces 30–42% more tokens at the same per-token price — here's what the benchmarks, cost audi

Diff · 2026-07-07

OpenAI Retires Version Numbers for Sol, Terra, and Luna Tier Naming System

OpenAI abandons traditional model versioning in favor of durable capability tiers — Sol, Terra, and Luna — under GPT-5.6

Diff · 2026-07-07

Google Unveils Open Knowledge Format v0.1 — The Missing Standard for AI Agent Knowledge

Google Cloud published OKF v0.1, an open, vendor-neutral specification for representing knowledge that AI agents can rea

Diff · 2026-06-29

Anthropic Gets Partial Mythos 5 Access After $20M Compliance Overhaul — The Selective Restoration That Redefines Frontier AI Access

Fifteen days after the US Commerce Department ordered Anthropic to suspend Claude Mythos 5 and Fable 5 globally, the gov

Diff · 2026-06-29

OpenAI's Confidential IPO S-1: $25B Revenue, But Losing $1.22 Per Dollar Earned

OpenAI confidentially filed its S-1 with the SEC on May 22, 2026, targeting a September listing above $1 trillion. The f

Diff · 2026-06-29

OpenAI Shifts to Three-Tier Pricing with GPT-5.6 — Named Tiers, Government-Only Limited Preview, and Ultra Subagent Mode

On June 26, 2026, OpenAI previewed GPT-5.6 in a bold new strategy: instead of a single model with a 'reasoning strength

Diff · 2026-06-29

Microsoft Launches Seven MAI Models at Build 2026 — Including First Proprietary Reasoning Model

At Microsoft Build 2026, Microsoft AI unveiled its first in-house MAI model family spanning seven new models across reas

Diff · 2026-06-29

US Bans Claude Fable 5 Worldwide — What the Export Control Precedent Means for Every AI Startup

On June 12, 2026, the US Department of Commerce issued an emergency export control directive ordering Anthropic to immed

Diff · 2026-06-29

Google DeepMind's Exodus: Nobel Laureate Jumper, Gemini Co-Lead Shazeer Join Rivals in Mass Departure

Google DeepMind is facing its worst talent drain in recent history. Over a single week ending June 19, four senior resea

Diff · 2026-06-29

OpenWeave 7B Releases With Native Multimodal Reasoning and Full Apache 2.0 Licensing

The OpenWeave consortium released its first production-ready open-weight multimodal model featuring native vision-langua

Diff · 2026-06-29

Micron Confirms HBM Memory Shortage Will Outlast 2027 — The Hidden Bottleneck in AI Compute

Micron's latest earnings confirm HBM shortages won't resolve by end of 2027 despite massive capex increases — the bottle

Diff · 2026-06-28

EU AI Office Mandates Machine-Readable Watermarking for All Commercial LLM Outputs

The EU's Phase II mandate forces every commercial LLM provider to embed cryptographically verifiable watermarks or face

Diff · 2026-06-28

Building an agent harness for local LLM evaluation

A practical guide to designing, running, and scoring local LLM agent evaluations offline using Ollama, LangChain, and cu

Diff · 2026-06-26

Edge AI inference: Running LLMs on Raspberry Pi 5 and Jetson with quantization

A practical guide to deploying quantized LLMs on Raspberry Pi 5 and NVIDIA Jetson, covering benchmarks, runtime tools, a

Diff · 2026-06-26

Multi-modal LLMs for production: when to use image/audio/video capabilities

Learn how to decide between text-only and multi-modal LLM workflows by evaluating latency, cost, and accuracy requiremen

Diff · 2026-06-25

GLM-5.2: Open-weights model beats GPT-5.5 for 1/6th the cost

Z.ai's GLM-5.2 outperforms GPT-5.5 on SWE-bench Pro while offering a massive 1M context window and significantly lower i

Diff · 2026-06-20

Anthropic restricts access to Claude Fable 5 and Mythos 5 for foreign nationals following US export order

Learn how a US Commerce Department directive under the Defense Production Act forced Anthropic to disable its most advan

Diff · 2026-06-20

LLMs for legal and compliance teams: contract analysis, e-discovery, and regulatory monitoring

Practical guide for using LLMs in legal and compliance workflows — contract clause extraction, privilege-aware e-discove

Diff · 2026-06-12

LLMs in regulated industries: compliance infrastructure and deployment patterns

Practical guide to deploying LLMs under HIPAA, SOC2, FCA and GDPR — including audit infrastructure, guardrails, output l

Diff · 2026-05-30

Hosting LLMs at scale: vLLM, TGI, SGLang and TensorRT-LLM compared

A performance-focused comparison of production inference engines: vLLM, SGLang, TensorRT-LLM, and TGI. Covers throughput

Diff · 2026-05-30

GPU rental vs API pricing: when to self-host

Break-even framework for GPU rental vs API pricing: when does renting a GPU beat paying per token?

Diff · 2026-05-30

LLM agent frameworks compared: LangChain, CrewAI, AutoGen, OpenAI Agents SDK

Comparing LangChain, CrewAI, AutoGen, and OpenAI Agents SDK on architecture, production readiness, observability, and wh

Diff · 2026-05-29

Quantisation explained: why model files have Q4, Q5 and GGUF labels

A plain-English guide to LLM quantisation: what Q4, Q5, Q8 and GGUF labels mean, how quantisation affects quality and sp

Diff · 2026-05-28

Changelog watching for AI teams: deprecations, pricing and model aliases

A monthly review checklist for keeping AI applications stable amid provider changes: model deprecations, pricing updates

Diff · 2026-05-28

What model cards tell you — and what they do not

How to read an LLM model card critically: what the claims mean, what is often missing, and how to spot the important gap

Diff · 2026-05-28

UK AI governance sources: ICO, NCSC, CMA and DSIT in one map

A practical guide to UK AI regulation sources — which UK regulator handles what, and where to find the official guidance

Diff · 2026-05-28

Small language models: when smaller is better

A practical guide to choosing small language models for latency-sensitive, cost-constrained or on-device AI tasks, with

Diff · 2026-05-28

Responsible AI policies that builders can actually operationalise

A practical guide to turning abstract responsible AI principles into release gates, evaluation checklists, incident revi

Diff · 2026-05-28

Reasoning models: what thinking modes change for cost and latency

A plain-English explanation of reasoning models, extended-thinking modes, and when the extra cost and latency are worth

Diff · 2026-05-28

Provider data retention policies: what API users should compare

A neutral checklist for comparing how AI API providers handle your data: training use, retention periods, abuse monitori

Diff · 2026-05-28

OpenAI, Anthropic, Google and Mistral APIs: what comparison pages should measure

A neutral comparison rubric for evaluating hosted LLM providers: what to compare, what to ignore, and how to avoid misle

Diff · 2026-05-28

NIST AI RMF and GenAI guidance: practical use for small teams

How to apply NIST's AI Risk Management Framework and GenAI profile as a lightweight launch checklist — without enterpris

Diff · 2026-05-28

Mixture-of-experts models: why active parameters matter

A plain-English explanation of mixture-of-experts architecture: why model labels list total and active parameters separa

Diff · 2026-05-28

Meta Llama and open model licensing: what builders must check

A practical guide to open-model licensing for builders: what Llama, Apache 2.0, and other open-weight licenses actually

Diff · 2026-05-28

Local LLM runtimes: Ollama, llama.cpp, vLLM and TGI in plain English

A practical comparison of the four main open-source LLM runtimes: when to use Ollama for prototyping, llama.cpp for sing

Diff · 2026-05-28

Hardware supply and inference economics: why chips shape AI products

A practical guide to how GPU and accelerator supply affects AI product cost, availability, and design decisions from tra

Diff · 2026-05-28

Guardrails compared: policy prompts, classifiers, validators and permissions

A practical comparison of LLM safety guardrails: where policy prompts, input and output classifiers, validators, and per

Diff · 2026-05-28

EU AI Act for LLM buyers: what to track without overclaiming

A practical map of EU AI Act obligations for teams buying and using LLM APIs — separating GPAI provider duties from down

Diff · 2026-05-28

Enterprise AI procurement: questions before buying a platform

A practical procurement checklist for comparing AI platforms, checking data control, security, model choice, cost and ex

Diff · 2026-05-28

Copyright and training data: what AI product teams can responsibly say

A plain-English guide to the three-layer copyright question in AI: training data use, output similarity, and what produc

Diff · 2026-05-28

AI vendor lock-in: model APIs, embeddings, vector stores and eval data

A practical guide to understanding and mitigating AI vendor lock-in across model APIs, embeddings, vector stores, prompt

Diff · 2026-05-28

AI energy use: useful facts without moral panic

A balanced guide to AI energy consumption: separating training, inference, datacentre, and hardware manufacturing energy

Diff · 2026-05-25

AI adoption in small businesses: where LLMs help first

A practical ranking of LLM use cases for small businesses: where the return on AI investment is clearest, where the risk

Diff · 2026-05-25

AI SLAs and status pages: what reliability evidence vendors publish

How to evaluate AI API reliability claims: what SLAs actually cover, how to read status pages critically, and what relia

Diff · 2026-05-24

What counts

A useful Diff page answers "so what?"

What changed: model, API, pricing, policy, benchmark, availability, tooling, or risk.
Who should care: builders, buyers, researchers, operators, regulators, or users.
What remains unproved: benchmarks, reliability, pricing edge cases, uptime, safety, or real workload fit.
What to do next: ignore, monitor, test, migrate, budget, update policy, or brief stakeholders.

Diff with the breathless bits removed

Diff pages in this prototype

A useful Diff page answers "so what?"