Vector databases: when semantic search is enough and when it is not
Vector databases are often sold as the answer to search, but smarter. Sometimes they are. Sometimes a simpler search index, a keyword filter or a well-designed database query is the better answer. The trick is knowing which problem you actually have.
Semantic similarity is not the same as relevance. A vector database can find documents that feel related to the query — similar words, similar phrasing, similar embedding directions — without actually containing the answer. When your retrieval layer cannot explain why it returned a result, you may have bought convenience at the cost of debugging. This guide helps you decide when the trade-off is worth it and when it is not.
Quick answer
Use a vector database when semantic similarity is the useful part of the problem and when you can maintain embeddings, metadata and retrieval evaluation over time. Start simpler for every other case:
- Keyword search is enough when your users know what they are looking for — SKUs, part numbers, document titles, known phrases. Elasticsearch or PostgreSQL full-text search will outperform a vector DB on exact-match queries at a fraction of the cost [1].
- pgvector is enough when you already use PostgreSQL and your semantic search needs fit within a single database operation — hybrid queries over a few thousand vectors, with metadata filters, without separating embedding storage from the rest of your data [2].
- A dedicated vector DB is worth considering when your corpus grows beyond tens of thousands of vectors, when you need low-latency ANN indexing with real-time updates, or when metadata filtering is a first-class requirement that cannot be bolted on after retrieval [3][4].
What this means
A vector store is one retrieval option, not a magical intelligence layer. It converts text (or images, audio, embeddings of any kind) into numerical coordinates in a high-dimensional space. When a query arrives, it finds the coordinates closest to the query’s coordinates. That closeness is mathematical — it measures direction and distance, not truth or usefulness.
This matters because vector search optimises for semantic proximity, not for relevance to the user’s actual need. A document about “bank” meaning a river bank will sit close to a query about “bank” meaning a financial institution if the embedding model groups them by the word’s surface meaning rather than the user’s intent. The embedding model has no concept of user intent. It has only training data and vector arithmetic.
The practical consequence: teams that reach for a vector DB first — before understanding their search problem — end up debugging retrieval failures that a simpler keyword index would never have produced.
Where teams get it wrong, with specific consequences
Using embeddings for search problems that are not semantic
A team working on an internal knowledge base indexes every document through an embedding pipeline and queries with a dedicated vector DB. Most of their searches are staff looking for specific policy numbers, form references or document titles — things with exact identifiers. The vector DB returns results ranked by cosine similarity, which spreads relevant exact-match results across a list of “also related” documents that share similar words but not the same meaning.
Consequence: The top-5 results for “HR-142 grievance procedure” include the correct document at position 3, flanked by “grievance case studies” and “HR team contact list.” Users learn to scroll past the first three results to find what they need. The team interprets low click-through on top results as an embedding quality problem and invests in fine-tuning, when the real fix is to promote exact matches before vector results.
Practical fix: Use hybrid search — run an exact keyword match in parallel with vector search and boost exact matches to the top of the ranking. Pinecone supports hybrid search with sparse-dense retrieval; Weaviate and Qdrant offer combined keyword-and-vector queries natively [3][4]. If your corpus is smaller than ~10,000 documents, start with pgvector’s built-in hybrid capabilities and avoid the operational cost of a separate vector DB entirely [2].
Ignoring metadata filters and access control
A team deploys a vector DB for a customer support system. Every chunk is embedded as raw text with no attached metadata. The system retrieves answers from across the entire knowledge base — including internal troubleshooting notes marked “not for customer use” — and surfaces them alongside public documentation. There is no way to filter by audience, document source, or content sensitivity.
Consequence: A customer asking “how do I reset my admin password?” receives instructions intended for support staff, including internal escalation paths and backend credentials embedded in troubleshooting notes. The retrieval system cannot distinguish between a document tagged audience: internal and one tagged audience: public because neither tag exists in the chunk store.
Practical fix: Every chunk must carry a minimal metadata schema before it enters the vector DB. At minimum:
source_path— original document identifieraudience— public, internal, confidentialheading_hierarchy— array of ancestor headings (e.g.,["Troubleshooting", "Password reset", "Admin escalation"])chunk_index— position within the source documentlast_updated— date of the source content
Pinecone, Weaviate and Qdrant all support pre-filtering on metadata before vector search [3][4][5]. A query that filters on audience: public will never return internal content, regardless of how close the embedding match is. This is not a nice-to-have. It is a security and trust requirement in any production system.
Assuming cosine similarity is the same as answer quality
A team measures retrieval quality by cosine similarity score alone. They set a threshold of 0.85 and assume anything above it is a good answer. What they discover: short queries with common words — “what is API” — produce high similarity scores against hundreds of documents that mention “API” in passing, while precise queries about specific implementations score lower because the vocabulary does not perfectly overlap.
Consequence: The team’s precision-at-k looks strong in offline tests because the test set was built from the same corpus the embedding model was trained on. In production, users see irrelevant results ranked high (because they share language with the query) and miss specific technical answers that use different vocabulary to describe the same concept.
Practical fix: Measure retrieval quality with task-specific metrics, not similarity scores alone. Track:
- Hit rate: Does the correct document appear in the top-k results?
- Mean Reciprocal Rank (MRR): How high is the first relevant result?
- Precision at k: How many of the top-k results are actually useful?
Use reranking as a second pass: after vector retrieval returns the top 20–50 candidate documents, a reranker model (such as Cohere Rerank or BGE-reranker) scores each candidate against the query with a stricter relevance model [6]. This separates the fast-but-approximate first stage (vector search) from the slower-but-precise second stage (reranker). The combined pipeline often doubles hit rate without requiring a complete vector DB rebuild.
Practical decision framework
Before you choose a vector DB, answer these questions in order:
-
Is the search task semantic, structured, or both?
- Semantic: “Find documents similar to this concept.”
- Structured: “Find policy HR-142.”
- Both: “Find policy HR-142 and similar grievance procedures.”
- If structured only, start with keyword search. If both, plan for hybrid from day one.
-
Do you need metadata filters, permissions or freshness rules?
- Does every user see the same corpus, or are results scoped by role, region, or document type?
- Can you afford to retrieve all candidates and filter later, or must filtering happen before the search?
- If pre-filtering is required, verify your chosen vector DB supports metadata pre-filtering without degrading latency.
-
Can you measure retrieval quality before you add generation?
- Do you have a test set of queries with known correct answers?
- Can you compute hit rate, MRR and precision@k without a generation step?
- If you cannot measure retrieval separately from generation, you will not know which layer is failing.
-
How large is your corpus, and how fast does it change?
- <10K vectors, low update frequency → pgvector is often the cheapest option [2].
- 10K–1M vectors, moderate updates → Pinecone or Qdrant with their managed indexing [3][5].
-
1M vectors, real-time updates → Weaviate’s real-time indexing or a custom FAISS-based pipeline [4].
- Do not outgrow your initial choice. If you expect to grow by an order of magnitude within a year, pick a platform that scales without a reindex.
-
What is your tolerance for false positives?
- Low tolerance (legal, medical, compliance) → use hybrid search + reranking as a mandatory second stage.
- Medium tolerance (internal knowledge base, product docs) → pure vector search with human-in-the-loop fallback may be sufficient.
- High tolerance (recommendation, discovery) → vector search alone is often adequate.
What would change this advice
This guidance is current as of May 2026 and reflects documented behaviour in Pinecone (latest managed index), Weaviate 1.26+, Qdrant 1.12+, pgvector 0.8+, Cohere Rerank 3, and common embedding models (text-embedding-3-*, ada-002, BGE).
The advice would need revision if:
- Embedding models become search-aware by default — e.g., models that can directly answer “is this document relevant to this query” rather than just producing vectors. This would reduce the gap between similarity and relevance that makes reranking necessary.
- Vector DBs absorb reranking as a built-in primitive — some platforms are moving toward integrated reranker stages. If that becomes standard, the separate reranker deployment step may disappear.
- Hybrid sparse-dense search reaches production parity with dedicated keyword engines — current hybrid implementations improve recall but still trade latency. If the latency gap closes, the argument for maintaining separate keyword and vector indexes weakens.
- Metadata-filtered vector search becomes a universal first-class operation — some vector DBs still penalise pre-filtering latency. If all major platforms offer zero-cost pre-filtering, the metadata schema becomes the only remaining architectural decision.
Methodology and sources
Check date: 2026-05-24
What was checked: Vector database documentation (Pinecone, Weaviate, Qdrant, pgvector), hybrid search patterns, reranker documentation (Cohere), and retrieval evaluation metrics guidance.
What the sources were used for:
- Elasticsearch / PostgreSQL comparison for non-semantic search problems [1]
- pgvector architecture and hybrid search capabilities [2]
- Pinecone hybrid search and metadata filtering [3]
- Weaviate hybrid search and metadata filtering patterns [4]
- Qdrant metadata filtering and indexing documentation [5]
- Cohere Rerank reranking methodology and hit-rate improvements [6]
Assumptions and limits:
- embedding quality, cost and context-window behaviour vary by provider and model version
- retrieval quality must be tested against real query distributions, not benchmark datasets
- reranking adds latency and token cost — the benefit depends on whether the top-1 accuracy gain justifies the second pass
- this is architectural guidance, not a benchmark or vendor recommendation
Change log
- 2026-05-24: first draft built from the llm-editor-approved brief.
- 2026-05-25: revised after editorial review — expanded to full 1500+ word draft with inline citations, three worked scenarios (hybrid search gap, metadata/access-control failure, cosine similarity as answer-quality proxy), decision framework, evidence-change section, production route links, and strengthened source descriptions. Editor’s Notes integrated into finished copy.
Source list
- Elasticsearch docs — https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html; PostgreSQL full-text search — https://www.postgresql.org/docs/current/textsearch-intro.html
- pgvector docs — https://github.com/pgvector/pgvector
- Pinecone docs — https://docs.pinecone.io/
- Weaviate docs — https://weaviate.io/developers/weaviate/search/hybrid
- Qdrant docs — https://qdrant.tech/documentation/
- Cohere Rerank docs — https://docs.cohere.com/docs/rerank