Vector databases: when semantic search is enough and when it is not

Vector databases are often sold as the answer to search, but smarter. Sometimes they are. Sometimes a simpler search index, a keyword filter or a well-designed database query is the better answer. The trick is knowing which problem you actually have.

Semantic similarity is not the same as relevance. A vector database can find documents that feel related to the query — similar words, similar phrasing, similar embedding directions — without actually containing the answer. When your retrieval layer cannot explain why it returned a result, you may have bought convenience at the cost of debugging. This guide helps you decide when the trade-off is worth it and when it is not.

TL;DR

Use a vector database when semantic similarity is the useful part of the problem and when you can maintain embeddings, metadata and retrieval evaluation over time. Start simpler for every other case:

Keyword search is enough when your users know what they are looking for — SKUs, part numbers, document titles, known phrases. Elasticsearch or PostgreSQL full-text search will outperform a vector DB on exact-match queries at a fraction of the cost [1].
pgvector is enough when you already use PostgreSQL and your semantic search needs fit within a single database operation — hybrid queries over a few thousand vectors, with metadata filters, without separating embedding storage from the rest of your data [2].
A dedicated vector DB is worth considering when your corpus grows beyond tens of thousands of vectors, when you need low-latency ANN indexing with real-time updates, or when metadata filtering is a first-class requirement that cannot be bolted on after retrieval [3][4].

What this means

A vector store is one retrieval option, not a magical intelligence layer. It converts text (or images, audio, embeddings of any kind) into numerical coordinates in a high-dimensional space. When a query arrives, it finds the coordinates closest to the query’s coordinates. That closeness is mathematical — it measures direction and distance, not truth or usefulness.

This matters because vector search optimises for semantic proximity, not for relevance to the user’s actual need. A document about “bank” meaning a river bank will sit close to a query about “bank” meaning a financial institution if the embedding model groups them by the word’s surface meaning rather than the user’s intent. The embedding model has no concept of user intent. It has only training data and vector arithmetic.

The practical consequence: teams that reach for a vector DB first — before understanding their search problem — end up debugging retrieval failures that a simpler keyword index would never have produced.

Where teams get it wrong, with specific consequences

Using embeddings for search problems that are not semantic

A team working on an internal knowledge base indexes every document through an embedding pipeline and queries with a dedicated vector DB. Most of their searches are staff looking for specific policy numbers, form references or document titles — things with exact identifiers. The vector DB returns results ranked by cosine similarity, which spreads relevant exact-match results across a list of “also related” documents that share similar words but not the same meaning.

Consequence: The top-5 results for “HR-142 grievance procedure” include the correct document at position 3, flanked by “grievance case studies” and “HR team contact list.” Users learn to scroll past the first three results to find what they need. The team interprets low click-through on top results as an embedding quality problem and invests in fine-tuning, when the real fix is to promote exact matches before vector results.

Practical fix: Use hybrid search — run an exact keyword match in parallel with vector search and boost exact matches to the top of the ranking. Pinecone supports hybrid search with sparse-dense retrieval; Weaviate and Qdrant offer combined keyword-and-vector queries natively [3][4]. If your corpus is smaller than ~10,000 documents, start with pgvector’s built-in hybrid capabilities and avoid the operational cost of a separate vector DB entirely [2].

Ignoring metadata filters and access control

A team deploys a vector DB for a customer support system. Every chunk is embedded as raw text with no attached metadata. The system retrieves answers from across the entire knowledge base — including internal troubleshooting notes marked “not for customer use” — and surfaces them alongside public documentation. There is no way to filter by audience, document source, or content sensitivity.

Consequence: A customer asking “how do I reset my admin password?” receives instructions intended for support staff, including internal escalation paths and backend credentials embedded in troubleshooting notes. The retrieval system cannot distinguish between a document tagged audience: internal and one tagged audience: public because neither tag exists in the chunk store.

Practical fix: Every chunk must carry a minimal metadata schema before it enters the vector DB. At minimum:

source_path — original document identifier
audience — public, internal, confidential
heading_hierarchy — array of ancestor headings (e.g., ["Troubleshooting", "Password reset", "Admin escalation"])
chunk_index — position within the source document
last_updated — date of the source content

Pinecone, Weaviate and Qdrant all support pre-filtering on metadata before vector search [3][4][5]. A query that filters on audience: public will never return internal content, regardless of how close the embedding match is. This is not a nice-to-have. It is a security and trust requirement in any production system.

Assuming cosine similarity is the same as answer quality

A team measures retrieval quality by cosine similarity score alone. They set a threshold of 0.85 and assume anything above it is a good answer. What they discover: short queries with common words — “what is API” — produce high similarity scores against hundreds of documents that mention “API” in passing, while precise queries about specific implementations score lower because the vocabulary does not perfectly overlap.

Consequence: The team’s precision-at-k looks strong in offline tests because the test set was built from the same corpus the embedding model was trained on. In production, users see irrelevant results ranked high (because they share language with the query) and miss specific technical answers that use different vocabulary to describe the same concept.

Practical fix: Measure retrieval quality with task-specific metrics, not similarity scores alone. Track:

Hit rate: Does the correct document appear in the top-k results?
Mean Reciprocal Rank (MRR): How high is the first relevant result?
Precision at k: How many of the top-k results are actually useful?

Use reranking as a second pass: after vector retrieval returns the top 20–50 candidate documents, a reranker model (such as Cohere Rerank or BGE-reranker) scores each candidate against the query with a stricter relevance model [6]. This separates the fast-but-approximate first stage (vector search) from the slower-but-precise second stage (reranker). The combined pipeline often doubles hit rate without requiring a complete vector DB rebuild.

Practical decision framework

Before you choose a vector DB, answer these questions in order:

Is the search task semantic, structured, or both?

Semantic: “Find documents similar to this concept.”
Structured: “Find policy HR-142.”
Both: “Find policy HR-142 and similar grievance procedures.”
If structured only, start with keyword search. If both, plan for hybrid from day one.

Do you need metadata filters, permissions or freshness rules?

Does every user see the same corpus, or are results scoped by role, region, or document type?
Can you afford to retrieve all candidates and filter later, or must filtering happen before the search?
If pre-filtering is required, verify your chosen vector DB supports metadata pre-filtering without degrading latency.

Can you measure retrieval quality before you add generation?

Do you have a test set of queries with known correct answers?
Can you compute hit rate, MRR and precision@k without a generation step?
If you cannot measure retrieval separately from generation, you will not know which layer is failing.

How large is your corpus, and how fast does it change?

<10K vectors, low update frequency → pgvector is often the cheapest option [2].
10K–1M vectors, moderate updates → Pinecone or Qdrant with their managed indexing [3][5].
1M vectors, real-time updates → Weaviate’s real-time indexing or a custom FAISS-based pipeline [4].
Do not outgrow your initial choice. If you expect to grow by an order of magnitude within a year, pick a platform that scales without a reindex.

What is your tolerance for false positives?

Low tolerance (legal, medical, compliance) → use hybrid search + reranking as a mandatory second stage.
Medium tolerance (internal knowledge base, product docs) → pure vector search with human-in-the-loop fallback may be sufficient.
High tolerance (recommendation, discovery) → vector search alone is often adequate.

What would change this advice

This guidance is current as of May 2026 and reflects documented behaviour in Pinecone (latest managed index), Weaviate 1.26+, Qdrant 1.12+, pgvector 0.8+, Cohere Rerank 3, and common embedding models (text-embedding-3-*, ada-002, BGE).

The advice would need revision if:

Embedding models become search-aware by default — e.g., models that can directly answer “is this document relevant to this query” rather than just producing vectors. This would reduce the gap between similarity and relevance that makes reranking necessary.
Vector DBs absorb reranking as a built-in primitive — some platforms are moving toward integrated reranker stages. If that becomes standard, the separate reranker deployment step may disappear.
Hybrid sparse-dense search reaches production parity with dedicated keyword engines — current hybrid implementations improve recall but still trade latency. If the latency gap closes, the argument for maintaining separate keyword and vector indexes weakens.
Metadata-filtered vector search becomes a universal first-class operation — some vector DBs still penalise pre-filtering latency. If all major platforms offer zero-cost pre-filtering, the metadata schema becomes the only remaining architectural decision.

Methodology

Data checked: 2026-05-28
Sources consulted: Pinecone, Weaviate, Qdrant, pgvector documentation; Elasticsearch and PostgreSQL full-text search docs; Cohere Rerank documentation; retrieval evaluation metrics guidance
Assumptions: Embedding quality, cost and context-window behaviour vary by provider and model version. Retrieval quality must be tested against real query distributions, not benchmark datasets. Reranking adds latency and token cost — the benefit depends on whether the top-1 accuracy gain justifies the second pass.
Limitations: This is architectural guidance, not a benchmark or vendor recommendation. Provider capabilities, pricing, and latency profiles change — verify against current documentation before committing to a platform.
Jurisdiction: Global. No jurisdiction-specific regulatory advice.

Source list

[1] Elasticsearch docs — https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html; PostgreSQL full-text search — https://www.postgresql.org/docs/current/textsearch-intro.html (accessed 2026-05-28)
[2] pgvector docs — https://github.com/pgvector/pgvector (accessed 2026-05-28)
[3] Pinecone docs — https://docs.pinecone.io/ (accessed 2026-05-28)
[4] Weaviate docs — https://weaviate.io/developers/weaviate/search/hybrid (accessed 2026-05-28)
[5] Qdrant docs — https://qdrant.tech/documentation/ (accessed 2026-05-28)
[6] Cohere Rerank docs — https://docs.cohere.com/docs/rerank (accessed 2026-05-28)

Trust Stack

Last checked: 2026-05-28
Corrections: Contact us to report errors

Change log

2026-05-28: editorial review — corrected writtenBy to “llm-author”, added 3 Editor’s Note cards, Trust Stack, slugified heading IDs, standardized Methodology format, added access dates to Source list
2026-05-25: revised after editorial review — expanded with inline citations, three worked scenarios, decision framework
2026-05-24: first draft