AI vendor lock-in: model APIs, embeddings, vector stores and eval data

When people talk about AI vendor lock-in, they usually mean the model API. If you switch from one provider to another, you change the model. That is the obvious layer.

The deeper problem is that lock-in happens at every layer of an AI system: the embedding provider, the vector database, the prompt formats, the evaluation data, the log schemas, and the observability setup. Each layer adds switching cost, and the sum of those costs can make leaving impractical even when the model provider is no longer the best option.

TL;DR

Model API lock-in is real but manageable. The harder lock-in comes from embeddings (which tie you to a provider’s vector space), evaluation data (which is specific to one model’s behaviour), and observability schemas (which do not export cleanly). Mitigate by using abstraction layers, portable data formats, and eval frameworks that work across providers. Test against at least two providers from the start so the integration path exists before you need it.

The lock-in layers

Model API

The most visible layer. Each provider’s API has different endpoint formats, parameter names, streaming implementations, and error structures. Switching means rewriting integration code.

How to mitigate: Use a gateway like LiteLLM or OpenRouter that provides a unified API across providers. Test against at least two providers from the start, even if you only use one in production.

Embeddings

Embeddings from different providers sit in different vector spaces. An embedding generated by OpenAI’s text-embedding-3-small cannot be compared directly with one from Google’s text-embedding-004 or a local model like gte-large. If you have already embedded your document corpus with one provider, switching means re-embedding everything.

How to mitigate: Use a local or open-source embedding model from the start so you can run it yourself. If you must use a provider embedding, design for re-embedding — store the original text alongside the embedding vector.

Vector database

Different vector databases support different index types, query syntaxes, and metadata filters. Migrating from Pinecone to Weaviate, or from Chroma to PostgreSQL+pgvector, requires schema redesign and data migration.

How to mitigate: Use a standard index format (HNSW is widely supported) and a metadata schema that maps easily between providers. Avoid vendor-specific query features unless the performance gain is clearly worth the switching cost.

Prompt formats and system instructions

Prompts written for one provider may not work the same way with another. System prompt roles, message formatting, tool-calling schemas, and structured output specifications vary.

How to mitigate: Use a prompt abstraction format like the OpenAI chat-completion standard and test key prompts with alternative providers during development. Structure prompts in a way that separates content from formatting.

Evaluation data

Eval datasets are often built around one model’s behaviour. Your golden dataset may include examples that a different model answers differently. If you cannot regenerate or re-label your eval data for a new model, you cannot reliably compare them.

How to mitigate: Build eval data around tasks and expected outcomes, not specific model behaviour. Include clear rubrics that a human evaluator or a different model-as-judge could apply consistently.

Observability and logging

Log schemas, trace formats, and monitoring dashboards are often built for one provider’s response structure. Exporting history, running historical evals, or replaying past requests after switching providers may require schema transformation.

How to mitigate: Store raw prompts and outputs in a portable format (JSON with consistent fields). Use observability tools that support multiple providers or can ingest from a standard event format.

Eval and regression CI

CI pipelines for AI often embed assumptions about the model provider. Switching means updating test runners, assertion formats, and pass/fail criteria.

How to mitigate: Keep the evaluation layer provider-agnostic. Use evaluation frameworks that accept a standardised input/output format rather than provider-specific API wrappers.

The switching cost you should estimate

Before committing to any provider, estimate what it would cost to leave:

Time to re-integrate. How many person-days to switch model API clients?
Time to re-embed. How long to regenerate embeddings for the vector corpus?
Time to re-evaluate. How long to re-run eval suites and validate output quality?
Data that does not export. Can logs, evaluation results, and configuration be exported? In what format?
Irrecoverable loss. Is there any data or configuration that cannot be moved at all?

If the total switching cost exceeds the value the provider delivers over its next-best alternative, you are locked in.

What teams get wrong

assuming lock-in is only about the model API;
building eval data that works only with one model’s behaviour;
embedding the document corpus with a proprietary model and ignoring the re-embedding cost;
storing logs and traces in a provider-specific schema without a portable export;
waiting until a crisis — deprecation, pricing change, incident — to think about switching.

Practical decision check

Can you switch model APIs with less than a week of engineering work?
Are your embeddings portable (or can you re-embed without downtime)?
Can your eval data be used to evaluate a different model?
Can you export all logs, traces and configuration?
Have you estimated the total switching cost for each layer?

If you cannot answer yes to at least three, your lock-in risk is higher than you think.

Methodology

Data checked: 2026-05-28
Sources consulted: Provider API documentation (OpenAI, Anthropic, Google), embedding model compatibility guides, vector database migration documentation (Pinecone, Weaviate, pgvector), gateway and router documentation (LiteLLM, OpenRouter), evaluation framework documentation (Promptfoo), and MTEB embedding benchmark leaderboard
Assumptions: This guide describes lock-in risk conceptually; specific switching costs depend on architecture, scale, and team size. The mitigation strategies are general guidance and may not suit every stack. Provider pricing, model versions, and export capabilities change frequently.
Limitations: This article does not cover on-premise deployment, fine-tuning lock-in, or hardware-level lock-in (GPU vendor dependency). It does not provide legal or regulatory advice. Switching cost estimates are illustrative, not calculated for specific workloads.
Jurisdiction: Global. No jurisdiction-specific regulatory guidance is included. Data residency requirements vary by region and may add additional switching constraints not covered here.

Source list

LiteLLM documentation — https://docs.litellm.ai/ (accessed 2026-05-28)
OpenRouter documentation — https://openrouter.ai/docs (accessed 2026-05-28)
OpenAI embedding documentation — https://platform.openai.com/docs/guides/embeddings (accessed 2026-05-28)
Hugging Face MTEB leaderboard — https://huggingface.co/spaces/mteb/leaderboard (accessed 2026-05-28)
Promptfoo evaluation framework — https://www.promptfoo.dev/ (accessed 2026-05-28)
Pinecone migration guides — https://docs.pinecone.io/guides/data/export-data (accessed 2026-05-28)

Trust Stack

AI draft model: gemma4:26b
AI review model: deepseek-r1:32b
Human editorial review: No (automated editorial pipeline)
Last substantive check: 2026-05-28
Corrections policy: Contact via Contact page
Affiliation: theLLMs has no vendor affiliation or sponsorship

Change log

2026-05-28: Full editorial review against 16-gate checklist. Added three Editor’s Note aside cards, slugified all heading IDs, added Trust Stack section with corrections policy and affiliation declaration, corrected frontmatter writtenBy label, standardised Methodology and Source List formats with access dates, replaced vague Pinecone/Weaviate reference with a specific Pinecone export documentation link, removed internal process language from Change Log.
2026-05-24: First published.