#- Access- control- for- RAG:- why- retrieval- permissions- matter- before- generation

##- TL;DR

If- your- RAG- system- retrieves- documents- the- user- should- not- see,- it- does- not- matter- what- instructions- you- put- in- the- generation- prompt.- The- model- sees- the- content- and- may- expose- it- —- through- direct- quoting,- inference,- or- accidental- leakage- in- reasoning- traces.- Access- control- must- be- enforced- at- indexing- and- retrieval- time,- not- delegated- to- the- prompt.

- - Editor's- Note - -

Most- teams- discover- this- problem- the- hard- way- —- during- a- security- review- or,- worse,- after- a- user- reports- seeing- another- team's- data.- If- you- are- building- a- RAG- system- today,- treat- retrieval-time- access- control- as- a- launch- blocker,- not- a- v2- feature.

##- What- it- means

In- a- typical- RAG- pipeline,- the- user- sends- a- query,- the- system- retrieves- relevant- documents- from- a- vector- database,- and- the- retrieved- context- is- inserted- into- the- LLM- prompt- for- generation.- The- problem- is- obvious- once- you- state- it:- if- the- retrieval- step- returns- a- confidential- document,- the- prompt- —- and- therefore- the- model- —- sees- that- document.

Prompt-level- mitigations- (“do- not- reveal- internal- documents”,- “only- answer- from- authorised- sources”)- are- fragile- because:

— Models- can- accidentally- quote- or- paraphrase- retrieved- content- even- when- instructed- not- to — Safety- refusals- or- reasoning- traces- may- reveal- that- a- confidential- document- was- retrieved — The- model- has- no- way- to- distinguish- document-level- permissions- —- it- sees- everything- in- the- prompt- as- context — Prompt- injection- or- prompt- manipulation- can- bypass- instruction-level- restrictions

Access- control- in- RAG- must- be- a- retrieval-layer- concern,- not- a- generation-layer- concern.

- - Editor's- Note - -

A- useful- rule- of- thumb:- if- a- document- appears- anywhere- in- the- prompt- context,- assume- the- user- can- see- it.- Models- are- not- access-control- systems- —- they- are- pattern- matchers- with- no- internal- concept- of- confidentiality- boundaries.

##- Where- teams- misuse- it

“We- use- a- system- prompt- that- says- ‘only- answer- from- authorised- documents’.”- This- is- the- most- common- and- most- dangerous- approach.- The- model- has- already- seen- the- unauthorised- document.- Whether- it- exposes- the- content- depends- on- the- model’s- instruction-following,- the- user’s- prompt,- and- luck- —- not- on- a- secure- access- boundary.

“We- store- all- documents- in- one- index- and- filter- by- metadata- at- query- time.”- Metadata- filtering- is- better- than- nothing,- but- it- depends- on- the- filter- being- correctly- applied- in- every- query- path.- A- bug- in- the- filter- logic,- a- missing- filter- parameter,- or- a- vector- search- that- returns- results- before- the- filter- is- applied- can- all- leak- documents.

“We- just- give- each- user- their- own- collection.”- This- works- if- the- mapping- between- users- and- collections- is- correct- and- maintained.- It- breaks- when- users- change- roles,- when- documents- are- shared- across- teams,- or- when- collections- are- administratively- misconfigured.

- - Editor's- Note - -

Metadata- filtering- at- query- time- is- the- most- common- middle- ground- —- and- the- one- most- likely- to- fail- silently- under- load.- If- you- go- this- route,- add- a- separate- authorisation- check- after- retrieval,- before- feeding- results- into- the- prompt.- Logging- any- filtered-out- results- as- a- security- event- gives- you- an- early- warning- signal- when- things- drift.

##- Practical- decision- check

Before- deploying- a- RAG- system- to- production,- verify:

— Does- retrieval- enforce- document-level- permissions- before- returning- results- to- the- prompt?- Not- just- in- the- embedding- query,- but- as- a- separate- authorisation- step. — Is- access- control- tested- for- every- role- and- document- sensitivity- level?- Create- test- queries- that- should- return- no- results- for- each- role/document- combination. — Is- there- a- “needle- test”- for- leakage?- Plant- a- clearly- identifiable- “secret”- document- that- only- an- admin- should- retrieve,- and- verify- no- other- role- can- get- it- via- direct- query,- paraphrasing,- or- reasoning-trail- exposure. — Is- access- control- logged- and- auditable?- Can- you- identify- which- user- retrieved- which- documents,- and- whether- any- unauthorised- access- attempt- occurred? — Does- the- system- handle- document- access- changes- retroactively?- When- a- user- loses- access- to- a- document,- do- existing- RAG- sessions- stop- retrieving- it?

##- Architectural- patterns- that- work

1.- Pre-filter- at- retrieval- time- —- apply- document-level- permissions- before- the- vector- search,- not- after.- Use- a- separate- authorisation- service- or- index- traversal- that- checks- each- candidate- document’s- access- control- list. 2.- Post-filter- with- separation- guarantee- —- if- pre-filtering- is- not- possible- (e.g.,- because- the- vector- store- does- not- support- it),- retrieve- a- large- candidate- set,- apply- authorisation- filtering,- then- use- only- the- authorised- subset- in- the- prompt.- Log- any- filtered-out- documents- as- a- security- event. 3.- Per-user- or- per-role- partitioned- indexes- —- create- separate- vector- indexes- for- each- access- group.- Query- routing- goes- to- the- correct- index- based- on- user- role.- This- is- operationally- heavier- but- provides- the- strongest- access- boundary. 4.- Audit-trail-aware- chunking- —- attach- document-level- permissions- to- each- chunk- at- indexing- time- so- the- permission- follows- the- content- wherever- it- is- stored. 5.- Never- rely- on- the- prompt- alone- —- every- unauthorised- document- that- reaches- the- prompt- is- a- security- incident- waiting- to- happen.

##- Caveats- and- scope- boundaries

— This- guide- addresses- document-level- access- control- within- a- RAG- pipeline.- It- does- not- cover- network-level- access,- API- authentication,- or- identity- provider- integration- —- those- are- prerequisites,- not- substitutes. — Pre-filtering- at- retrieval- time- adds- latency- to- the- search- step;- benchmark- your- chosen- vector- store- with- your- document- volume- and- access-control- rules. — Per-user- partitioned- indexes- multiply- storage- and- maintenance- overhead.- This- approach- is- appropriate- for- high-sensitivity- environments- (legal,- healthcare,- finance)- but- may- be- overkill- for- internal- knowledge- bases- with- coarse- permission- models. — No- single- access-control- approach- is- perfect.- Layered- defence- —- retrieval- filtering,- prompt- instructions,- and- output- monitoring- —- is- the- appropriate- posture- for- sensitive- document- collections. — Vector- database- access- control- features- evolve- rapidly.- Check- your- provider’s- current- documentation- as- implementations- change- —- the- capabilities- described- here- were- accurate- as- of- May- 2026.

##- Methodology

— Data- checked:- 2026-05-28 — Sources- consulted:- OWASP- Top- 10- for- LLM- Applications- (genai.owasp.org),- Pinecone- access- control- documentation,- Weaviate- developer- documentation,- Qdrant- filtering- documentation,- Chroma- filtering- documentation,- AWS- IAM- documentation,- Azure- RBAC- documentation,- NIST- AI- RMF — Assumptions:- The- reader- operates- or- designs- a- RAG- system- that- indexes- documents- with- varying- sensitivity- levels- across- multiple- users- or- teams — Limitations:- This- article- covers- document-level- access- control- within- RAG- pipelines- only.- It- does- not- address- network- segmentation,- encryption- at- rest,- or- inference-time- access- control- at- the- model- provider- level — Jurisdiction:- Global.- Relevant- frameworks- include- OWASP- (global),- NIST- AI- RMF- (US),- and- GDPR- (EU/UK)- where- personal- data- is- indexed

##- Source- list

— OWASP- LLM- Top- 10- —- https://genai.owasp.org/llm-top-10/- (accessed- 2026-05-28) — Pinecone- access- control- documentation- —- https://docs.pinecone.io/docs/manage-access- (accessed- 2026-05-28) — Weaviate- access- control- —- https://weaviate.io/developers/weaviate/manage-data/access- (accessed- 2026-05-28) — Qdrant- filtering- and- access- control- —- https://qdrant.tech/documentation/concepts/filtering/- (accessed- 2026-05-28) — Chroma- filtering- documentation- —- https://docs.trychroma.com/usage-guide#filtering- (accessed- 2026-05-28) — AWS- IAM- documentation- —- https://aws.amazon.com/iam/- (accessed- 2026-05-28) — Azure- Role-Based- Access- Control- (RBAC)- —- https://learn.microsoft.com/en-us/azure/role-based-access-control/- (accessed- 2026-05-28) — NIST- AI- RMF- —- https://www.nist.gov/ai- (accessed- 2026-05-28)

##- Related- guides- guides- guides- guides

— Data- leakage- in- LLM- apps:- logs,- prompts,- files- and- vendor- retention — Citation- quality- in- AI- answers:- source-grounded- does- not- mean- source-faithful — Building- a- minimum- viable- RAG- system- without- overengineering — PII- handling- for- LLM- apps:- minimisation- before- redaction

##- Trust- Stack

— AI- draft- model:- gemma4:26b — AI- review- model:- deepseek-r1:32b — Human- editorial- review:- No- (automated- editorial- pipeline) — Last- substantive- check:- 2026-05-28 — Corrections- policy:- If- you- spot- an- error,- contact- us- via- the- Contact- page — Affiliation:- theLLMs- has- no- vendor- affiliation,- sponsorship,- or- commercial- relationship- with- any- AI- provider- mentioned

##- Change- log

— 2026-05-28:- Full- editorial- review- against- 16-gate- checklist.- Added- Editor’s- Notes,- Methodology,- Source- list,- Trust- Stack,- slugified- heading- IDs,- and- standalone- Caveats- section.- Renamed- “Short- answer”- to- “TL;DR”.- Updated- dates- to- 2026-05-28. — 2026-05-25:- Initial- audit- revision.- Added- direct- source- URLs- to- evidence- section;- changed- source- listing- from- named-only- references- to- linked- citations.- No- material- changes- to- claims- or- guidance.