Open weights vs hosted APIs: practical trade-offs {#open-weights-vs-hosted-apis-practical-trade-offs}
If you need the shortest useful answer: open weights give you more control, while hosted APIs give you less operational work.
Open weights are often the better fit when you care about deployment control, data boundaries, custom serving, or the option to tune the stack around your own infrastructure. Hosted APIs are often the better fit when you care more about speed to launch, lower ops burden, vendor scaling, and not having to run model serving yourself.
The catch is that neither route is automatically cheaper, safer, or more private in every case. Open weights still come with licence terms, runtime maintenance, observability, and patching responsibility. Hosted APIs can reduce that burden, but they replace it with provider dependency, rate limits, policy changes, and account-tier rules.
If the real question is governance, read the licence and the retention or processing terms before you read benchmark chatter. If the real question is cost, compare the full workload rather than the headline model rate.
Trust stack {#trust-stack}
AI draft model: gpt-5.4-mini. AI review model: gpt-5.4. Checked against current model-card, licence, pricing and privacy documentation on 2026-05-22.
Editor’s Note: “We need control” often means “we have not priced the ops commitment yet.” Running your own model is not just a model choice; it is a staffing and maintenance choice.
Editor’s Note: “The API is easier” can be true and still leave you exposed to exit risk, quota limits, policy changes, or a contract shape you did not notice until later.
Editor’s Note: Licence language matters long before anyone cares about benchmark scores. If the terms do not fit your commercial use, the nicest model card in the world is not a usable solution.
Quick answer {#quick-answer}
Choose open weights if you need tighter control over where the model runs, what data crosses the boundary, or how much of the serving layer you own. Choose a hosted API if you need the fastest route to a working product and you would rather buy capability than operate it.
Do not assume “open weights” means “open source” or “commercially unrestricted”. The licence is part of the product decision, not an afterthought.
Do not assume “hosted API” means “less private” in a simplistic way either. Enterprise terms can improve data handling, but they do not remove vendor dependence, and they do not turn a managed service into self-hosting.
What open weights and hosted APIs actually mean {#what-open-weights-and-hosted-apis-actually-mean}
These terms get blurred together in casual conversation, which is how teams end up making the wrong comparison.
- Open weights means the model weights are available for you to run, adapt, or deploy under the model’s licence terms. It does not automatically mean the model is open source in the strict software sense.
- Hosted API means you call a provider-managed service that runs the inference runtime for you. You consume the model through an endpoint instead of operating the serving stack yourself.
- Model card means the official documentation page for the model. It is where you should look first for licence terms, intended use, constraints, and known limitations.
- Retention policy means the provider’s rules for how long prompts, outputs, logs, or metadata may be kept.
- Inference runtime means the serving layer that turns model weights into responses.
- Vendor lock-in means the cost of switching becomes high because your prompts, tooling, data handling, quotas, or approvals are tightly tied to one provider.
That distinction matters because “open weights” can still be tightly licensed, while a hosted API can still be enterprise-safe enough for some workloads. The right comparison is not ideology. It is operating reality.
Where open weights give you more control {#where-open-weights-give-you-more-control}
Open weights are strongest when control is the point of the exercise.
You get to decide where the model runs, how requests are routed, whether logs stay inside your environment, and how much custom inference work you want to do. That can matter for data-boundary requirements, edge deployments, air-gapped environments, or organisations that want to own the model-serving layer rather than rent it.
Open weights can also help if you need to adjust the serving stack around your workload rather than fitting your workload into a provider’s service shape. For example, you may want custom quantisation, specific batching behaviour, local failover, or a runtime that matches your internal observability stack.
But the control is not free. You also own the messy bits: uptime, capacity planning, security reviews, patching, rollback plans, monitoring, and the boring but essential task of keeping the serving path healthy.
If a team says it wants open weights for “privacy”, the real follow-up question is usually this: privacy under whose controls, and with what evidence?
Where hosted APIs remove real work {#where-hosted-apis-remove-real-work}
Hosted APIs are useful when you want the model capability without building the serving operation around it.
That means no GPUs to manage, no inference cluster to tune, no local failover stack to keep alive, and no model-serving patch cycle to own. For small teams, early-stage products, or workloads where the model is only one part of the system, that can be the difference between shipping and drowning in infrastructure.
Hosted APIs also make it easier to test ideas quickly. You can try a provider, swap models, or change your prompt strategy without rebuilding the whole runtime layer each time.
The trade-off is that you inherit provider rules. You need to watch quota limits, pricing changes, context thresholds, caching behaviour, batch discounts, account terms, and any policy or retention language that affects your data handling.
Hosted APIs remove work. They do not remove responsibility.
Licensing, privacy and governance checks {#licensing-privacy-and-governance-checks}
This is the part teams often skip because it feels less exciting than capability comparisons. It is also the part that can kill the architecture choice later.
Open-weight licensing is not one thing
The open-weight bucket is not uniform.
- Qwen2.5-72B-Instruct is marked in its model card with
license: other, and the linked Qwen licence says that if you are commercially using the materials and your product or service has more than 100 million monthly active users, you must request a licence from Alibaba Cloud. The licence also requires attribution notices in distributed copies. - Mistral-7B-Instruct-v0.3 is marked in its model card with
license: apache-2.0.
That is the practical point: one model family may be easy to adopt commercially while another carries extra obligations. “Open weights” tells you the weights are available. It does not tell you the commercial rights story on its own.
Privacy and processing terms still matter on hosted services
Anthropic’s privacy policy says it covers consumer use and commercial services where Anthropic acts as a data controller, and it points to a separate Non-User Privacy Policy for how its models are trained and how personal data from third-party sources may be used when developing or delivering products and services. The same policy says that when Anthropic acts as a data processor for commercial customers, the commercial customer is the controller.
Google’s data processing addendum says Google is a processor and the customer is a controller or processor, as applicable, for customer personal data.
Those are important distinctions, but they do not magically erase the risk trade-off. They just tell you which party is responsible for which part of the relationship.
The useful governance question
If your workload contains personal data, confidential material, or regulated material, the key question is not “open weights or API?” in the abstract. It is:
- who controls the data path;
- who retains what, and for how long;
- who can train on what, if anything;
- which account tier or enterprise terms apply;
- and what evidence you can show if someone asks why the setup is compliant.
Current provider snapshot {#current-provider-snapshot}
The list below uses current provider pages checked on 2026-05-22. These are list prices and policy snapshots, not a quote for your workload.
| Route | What you control | Privacy / retention shape | Cost shape | Ops burden | Switching risk |
|---|---|---|---|---|---|
| Open weights, example: Qwen2.5-72B-Instruct | Highest control over deployment, network path, logging and runtime choices | You decide the hosting boundary, but the model licence still governs use; Qwen’s licence adds a commercial-use trigger at very large scale | No vendor token bill from a hosted API; you pay infra, storage, monitoring and staff time | Highest: serving, patching, scaling, failover, evaluation and rollback are yours | Lower vendor lock-in, but higher stack lock-in if your runtime becomes deeply customised |
| Open weights, example: Mistral-7B-Instruct-v0.3 | Same control profile as above | Apache-2.0 is simpler to reason about than many custom model licences | Same self-hosted cost shape | Same self-hosted burden | Same self-hosted risk profile |
| Hosted API, Anthropic Claude Sonnet 4.6 | Less control over the serving layer; you control the integration | Anthropic’s privacy policy splits consumer and commercial services and distinguishes controller vs processor relationships | Refer to the current Anthropic pricing page for live input, output, cache-write and cache-read rates; batch discounts may also apply | Lower infra burden, but you still manage prompts, retries, quotas and fallback paths | Higher provider dependence and policy exposure |
| Hosted API, Google Gemini 2.5 Pro | Less control over the serving layer; you control the integration | Google’s DPA says Google is a processor and the customer is controller or processor, as applicable | Input pricing starts at $1.25 per 1M tokens and rises above the 200K-token threshold; cached-input and output rates are listed separately on the current pricing page | Lower infra burden, but you still manage prompts, retries, quotas and fallback paths | Higher provider dependence and policy exposure |
A useful reminder sits behind the table: hosted APIs can look cheap until the workload includes retries, longer outputs, cache misses, or a need for more than one vendor. Open weights can look “free” until you count GPUs, ops, and the people required to keep the thing alive.
Cost, ops burden and switching risk {#cost-ops-and-switching-risk}
The real decision is usually not about purity. It is about which cost you prefer to own.
- Open weights shift cost from vendor billing to infrastructure, maintenance, observability, and staffing. That can be attractive if you have steady demand, strong engineering capacity, or a strong reason to control the environment.
- Hosted APIs shift cost toward metered usage and provider dependence. That can be attractive if you want to avoid building a model-serving team just to answer a product question.
- Switching risk is different in each direction. Open weights reduce provider lock-in, but they can increase internal lock-in if your whole product becomes dependent on one runtime, one deployment pattern, or one evaluation pipeline.
A common mistake is to think that self-hosting automatically reduces long-term risk. It sometimes does. But if the self-hosted path grows into a maintenance island that only one engineer understands, you have not removed risk. You have moved it.
A workload-based decision checklist {#a-workload-based-decision-checklist}
Use the checklist below before you pick a route.
- Choose open weights if you need to control the hosting boundary, can justify the operational overhead, and have a clear reason not to depend on a provider’s serving layer.
- Choose hosted APIs if you need to ship quickly, want to minimise infra work, or need to test product value before investing in a full inference stack.
- Re-check the licence if the commercial deployment may be large, regulated, or distributed to many users.
- Re-check the retention and processor/controller terms if prompts or outputs contain personal or confidential data.
- Estimate the full workload, not just the model rate.
- Pilot on a real task if you are unsure. Architecture arguments are cheaper after a week of actual usage.
If you cannot state your retention requirement in one sentence, you are probably not ready to choose the architecture yet.
Global applicability {#global-applicability}
This article is global. There is no UK, GB or Northern Ireland split to apply here.
The useful caution is the same everywhere: model cards, licences, pricing pages and privacy terms change, and the page you check before you buy is the page that matters.
Methodology {#methodology}
Check date: 2026-05-22
What was checked:
- Qwen2.5-72B-Instruct model card and licence text for model-family status, licence type and commercial-use restrictions.
- Mistral-7B-Instruct-v0.3 model card for licence type and model-card notes.
- Anthropic pricing and privacy pages for current public pricing shape, cache or batch notes, consumer or commercial policy distinctions, and controller or processor wording.
- Google Gemini pricing page for current list-pricing bands, cached-input pricing and token-threshold wording.
- Google Cloud data processing addendum for processor or controller wording.
- Live search results for the query “open weights vs hosted APIs” as the current top-result equivalent used to shape the information-gain angle.
What the sources were used to verify:
- open weights and hosted APIs are not the same thing;
- open-weight licences vary materially;
- hosted-provider privacy terms and data-processing roles are not interchangeable with self-hosting;
- current pricing for hosted APIs can change by model, tier, caching and token thresholds;
- the decision should be based on workload, governance, and operational burden rather than ideology.
Assumptions and limits:
- All currency values are USD where provider pricing pages use USD.
- The Anthropic pricing route currently redirects to
claude.com/pricing; provider wording and page structure may move again after the check date. - Self-hosted cost for open weights is not a fixed list price; it depends on compute, storage, staff time and deployment design.
- This guide does not claim legal advice or a compliance verdict.
- The comparison is intended to help a buyer or platform lead scope the decision, not to declare one route universally better.
Source list {#source-list}
- Qwen2.5-72B-Instruct model card — https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/raw/main/README.md
- Qwen licence text — https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/raw/main/LICENSE
- Mistral-7B-Instruct-v0.3 model card — https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/raw/main/README.md
- Anthropic pricing page — https://www.anthropic.com/pricing
- Anthropic privacy policy — https://www.anthropic.com/legal/privacy
- Google Gemini pricing page — https://cloud.google.com/gemini-enterprise-agent-platform/generative-ai/pricing
- Google Cloud data processing addendum — https://cloud.google.com/terms/data-processing-addendum
Change log {#change-log}
- 2026-05-22: first draft built from the llm-editor-approved brief using current model-card, licence, pricing and privacy checks, with explicit open-weights versus hosted-API trade-offs and a workload-based decision frame.