The evidence-led AI website manifesto: how theLLMs will review claims
This site exists because most AI content on the web is marketing. Vendor blogs that benchmark against last year’s models. Thought pieces that describe what AI will do next year without evidence. How-to guides that skip the failure modes. Comparison pages that show five features but omit the one that matters.
theLLMs takes a different approach. We write for builders, buyers, and curious operators who need to make decisions — not for people who want to feel excited about AI.
Editor’s Note: If a claim on this site does not have a source, a date, or a caveat, it is probably a mistake. Tell us. We will fix it and update this version. Editor’s Note: This manifesto is itself a living document. We will update it when we find better ways to earn trust, and we will date every change.
How we review claims
Every article on this site follows a structured review process:
1. Sources must be primary or near-primary. We cite the original research paper, the provider’s official documentation, the regulatory guidance, or the benchmark data directly. We do not cite other blogs that claim to summarise the research unless the primary source is behind a paywall or unavailable.
2. Dates are mandatory. Every source includes a “checked on” date. Every claim about pricing, model capability, or provider policy includes a date. If a source is more than six months old, we say so and flag it for review.
3. Uncertainty is explicit. We do not say “models are getting cheaper” when we mean “OpenAI reduced GPT-4o pricing by 50% on 2026-02-15, but Anthropic raised Claude pricing on 2026-03-01.” We say what changed, for whom, and what is still unknown.
4. Failure modes are part of the answer. Every guide on this site includes a section on what can go wrong and where teams misuse the technique. If a technique is useful in narrow circumstances but oversold generally, we say that plainly.
5. No fake hands-on claims. If we write about a model or tool we have not tested, we say so. If we have tested it, we describe the environment, prompts, and outputs so readers can assess the evidence themselves.
What we do not do
We do not write vendor press releases. If a provider releases a new model, we may write about it — but the article will focus on what changed, what is still unknown, and what operators should verify themselves, not on why the provider thinks the model is revolutionary.
We do not make predictions without evidence. “AI will transform X industry by 2027” is not a claim we publish. “Provider Y released a model that scores Z% on benchmark B under conditions C, which suggests capability improvement in domain D” is a claim we publish, with sources and caveats.
We do not hide AI-assisted writing. Every article discloses the model that produced the initial draft and the model that performed the editorial review. Readers deserve to know how the content was created, even — especially — when the content is about AI.
We do not claim expertise we do not have. We are not lawyers, doctors, or certified financial advisors. We do not give legal, medical, or financial advice. We explain how AI systems work and what evidence exists for their claims. Decisions based on our content remain the reader’s responsibility.
How we handle corrections
If a reader reports an error:
- We verify the claim against the original source
- We update the article and add a correction notice with the date and nature of the change
- We review related articles for the same error
- We log the correction in our internal quality tracking
If a source becomes outdated:
- The article is flagged for review
- A new check-in date is assigned
- If the source cannot be replaced, the article carries a warning about the stale source
How we handle uncertainty
We categorise claims into four levels:
| Level | Meaning | Example |
|---|---|---|
| Verified | Supported by multiple primary sources with consistent evidence | ”OpenAI released GPT-4o on 2024-05-13” |
| Supported | Supported by a single primary source or limited evidence | ”Claude 3.5 Sonnet scores higher than GPT-4o on SWE-bench Verified as of 2026-01” |
| Contested | Different sources or experts disagree | ”The optimal chunk size for RAG” — depends on document type, model, and task |
| Unknown | No reliable evidence available | ”When will AI surpass human performance on [specific task]” — do not guess |
We label claims with their uncertainty level. We do not present contested or unknown claims as verified.
What changes would update this policy
- We discover we have been applying these standards inconsistently
- New regulatory guidance changes what constitutes responsible AI publishing
- Readers provide feedback that changes our understanding of what is trustworthy
- We find a better way to present uncertainty and evidence
Methodology and sources
This manifesto draws on NIST AI RMF governance practices, journalistic source-verification standards adapted for AI-specific claims, and operational experience from running this site.
- NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework — checked 2026-05-24
- ICO guidance on AI transparency: https://ico.org.uk/for-organisations/ai-and-data-protection/ — checked 2026-05-24
Change log
- 2026-05-24 — First published version.
Source list
- NIST AI RMF: https://www.nist.gov/itl/ai-risk-management-framework
- ICO AI guidance: https://ico.org.uk/for-organisations/ai-and-data-protection/