hero_image:- “/images/hero/ai-incident-response-what-to-do-when-a-model-gives-harmful-or-wrong-advice.png” layout:- ../../layouts/GuideLayout.astro title:- “AI- incident- response:- what- to- do- when- a- model- gives- harmful- or- wrong- advice” description:- “A- practical- guide- to- adapting- incident- response- for- prompts,- outputs,- evals,- rollbacks- and- customer-facing- AI- failures.” writtenBy:- “gemma4:26b” reviewedBy:- “deepseek-r1:32b” lastChecked:- “2026-05-28” scope:- “Global.- Provider- and- standards- sources- checked- as- of- 2026-05-28.”

#- AI- incident- response:- what- to- do- when- a- model- gives- harmful- or- wrong- advice

When- an- AI- feature- gives- bad- advice,- the- incident- is- not- just- the- answer.- It- is- the- path- that- let- the- answer- reach- a- user,- and- the- path- that- failed- to- stop- it- in- time.

Treat- harmful- outputs- like- product- incidents:- contain,- preserve- evidence,- roll- back- or- disable- the- risky- path,- notify- the- right- people,- and- only- then- ask- how- the- model- got- there.

The- right- playbook- depends- on- severity.- A- wrong- suggestion- is- not- the- same- as- a- safety-critical- mistake,- but- both- need- a- clear- path- for- containment- and- review.

##- TL;DR

Treat- harmful- AI- outputs- as- critical- product- incidents- that- require- immediate- containment- and- investigation.- When- a- model- generates- harmful- or- incorrect- advice,- you- must- first- preserve- all- session- evidence—prompts,- context,- and- logs—before- rolling- back- or- disabling- the- risky- path.- Only- after- securing- the- system- should- you- investigate- whether- the- failure- occurred- in- the- prompt,- retrieval- augmentation,- or- tool- authorization.

##- What- this- means ##- TL;DR

##- What- this- means

AI- incident- response- follows- the- same- structure- as- any- product- incident- —- contain,- triage,- fix,- learn- —- but- with- an- extra- step:- you- need- to- separate- the- model- output- from- the- product- path- that- delivered- it.- The- output- was- wrong.- But- was- the- failure- in- the- model,- the- prompt,- the- retrieved- context,- the- tool- permission,- the- approval- gate,- or- the- output- validation?- The- answer- determines- whether- you- fix- the- system- prompt,- add- a- guardrail,- update- the- RAG- source,- or- disable- the- tool.

The- incident- severity- determines- the- response- speed- and- escalation- path.- A- P1- incident- (harmful- output- reaching- a- wide- audience,- safety-critical- error,- data- exposure)- needs- immediate- containment- —- disable- the- feature- or- route.- A- P3- incident- (wrong- advice- that- is- annoying- but- not- harmful,- formatting- error- in- a- low-traffic- surface)- can- be- queued- for- the- next- sprint.- Most- teams- skip- the- severity- classification- and- treat- every- model- mistake- as- urgent,- which- means- nothing- is- actually- urgent.

##- Where- teams- misuse- it

|— No- severity- ladder- for- AI- incidents.- Without- a- severity- classification,- a- model- generating- a- slightly- wrong- product- description- gets- the- same- response- as- a- model- generating- harmful- medical- advice.- Both- get- escalated,- both- get- a- post-mortem,- and- the- team- fatigues- on- the- low-severity- ones- while- the- high-severity- ones- get- lost- in- the- noise. |— Fixing- the- prompt- but- leaving- the- tool- path- open.- A- model- generated- an- incorrect- refund- amount.- The- team- updates- the- system- prompt- to- say- “always- verify- refund- amounts.”- But- the- tool- that- executes- refunds- has- no- approval- gate- and- no- upper- limit- check.- The- prompt- fix- reduces- the- surface- but- does- not- fix- the- product- —- the- same- failure- can- recur- with- a- differently- phrased- user- request. |— Skipping- evidence- preservation- before- rolling- back.- When- a- harmful- output- is- detected,- the- team- rolls- back- the- model- version- or- disables- the- feature- immediately- —- the- right- instinct.- But- they- do- not- preserve- the- prompt,- context,- tool- calls,- and- output- that- caused- the- incident.- When- the- post-mortem- starts,- nobody- has- the- data- to- understand- what- happened. |— Treating- every- incident- as- a- model- problem.- A- model- gave- a- wrong- answer.- The- investigation- stops- at- “the- model- hallucinated.”- But- the- real- failure- was- in- retrieval- (the- wrong- document- was- returned),- tool- permissions- (the- model- had- write- access- it- should- not- have- had),- or- output- validation- (the- check- that- should- have- caught- the- wrong- fact- was- missing).- Model- behaviour- is- the- last- thing- to- blame,- not- the- first.

- - Editor's- Note - -

When- the- post-mortem- defaults- to- "the- model- hallucinated,"- you- have- a- highly- complex- process- problem,- not- a- model- problem.- Start- every- AI- incident- investigation- by- asking:- what- validated- this- output- before- it- reached- the- user?- If- the- answer- is- "nothing,"- the- model- did- not- fail- —- the- product- design- did.

###- Severity- ladder- reference

A- practical- severity- classification- for- AI- incidents:

|— P1- —- Critical:- The- model- generated- output- that- caused- or- could- cause- real- harm- (safety- advice,- medical,- financial,- legal);- PII- was- exposed- to- unauthorised- users;- the- model- executed- a- destructive- tool- call- without- approval.- Response:- disable- the- feature- or- route- immediately,- preserve- evidence,- notify- stakeholders- within- 1- hour,- full- post-mortem- within- 48- hours. |— P2- —- High:- The- model- generated- consistently- wrong- factual- answers- in- a- customer-facing- surface;- a- minor- policy- violation- in- output;- a- tool- call- that- caused- a- non-destructive- but- visible- side- effect- (wrong- ticket- created,- wrong- email- sent).- Response:- disable- if- practical,- otherwise- add- a- warning- banner- or- output- filter;- notify- within- 4- hours;- post-mortem- within- 1- week. |— P3- —- Moderate:- The- model- generated- an- incorrect- answer- in- an- internal- tool;- a- formatting- issue;- a- single- wrong- fact- that- the- user- can- reasonably- disregard.- Response:- queue- fix- for- next- sprint;- log- the- incident;- review- in- next- team- triage. |— P4- —- Low:- The- model- was- less- helpful- than- expected;- a- response- was- verbose;- a- suggestion- was- technically- correct- but- not- useful.- Response:- log- as- eval- feedback;- no- incident- process- needed.

##- Practical- decision- check

When- an- AI- failure- is- reported,- ask:

|— What- severity- class- does- this- belong- to?- Use- the- ladder- above.- A- wrong- answer- to- “how- do- I- cancel- my- account?”- is- different- from- a- wrong- answer- to- “what- medication- should- I- take?” |— What- evidence- needs- to- be- preserved- before- anything- changes?- Capture- the- exact- prompt,- retrieved- context,- tool- calls- (if- any),- model- response,- timestamp,- model- version,- and- deployment- ID.- Preserve- before- rolling- back. |— What- part- of- the- stack- failed?- Model- behaviour,- retrieval- quality,- prompt- design,- tool- permission,- approval- gate,- output- validation,- or- user- context?- Fix- the- layer- that- failed,- not- the- nearest- symptom. |— Can- the- risky- path- be- disabled- without- disabling- the- whole- feature?- If- a- specific- tool- or- retrieval- source- caused- the- failure,- disable- that- path- rather- than- turning- off- the- entire- AI- surface. |— What- would- stop- the- same- failure- from- recurring?- A- prompt- fix- alone- is- rarely- enough.- Add- an- automated- guardrail,- a- human- review- gate,- a- retrieval- boundary,- or- a- model-agnostic- output- validator.

- - Editor's- Note - -

The- question- "what- part- of- the- stack- failed?"- is- the- one- most- teams- skip- because- it- requires- cross-disciplinary- investigation- —- prompt- engineering,- retrieval,- tool- permissions,- and- output- validation- all- need- to- be- checked.- Assign- one- person- to- coordinate- the- full-stack- trace.- Letting- each- team- investigate- their- own- layer- produces- four- partial- answers- and- zero- root- cause.

##- Caveats- and- scope- boundaries

|— This- guide- covers- AI-specific- incident- response- for- LLM-powered- features.- It- assumes- your- organisation- already- has- a- general- incident- response- process- —- it- adds- the- AI-specific- layers,- not- the- fundamentals. |— The- severity- ladder- is- a- practical- starting- point,- not- a- regulatory- requirement.- Adapt- thresholds- to- your- product’s- risk- profile- and- your- jurisdiction’s- reporting- obligations. |— Provider- incident- response- documentation- (Anthropic,- OpenAI)- and- security- guidance- (NCSC,- CISA,- NIST)- evolve.- The- sources- cited- here- were- current- as- of- May- 2026. |— This- guide- does- not- cover- regulatory- breach- notification- timelines- (GDPR,- SEC,- sector-specific).- Those- are- separate- obligations- that- may- trigger- before- or- alongside- the- incident- response- steps- described- here.

##- Methodology

|— Data- checked:- 2026-05-28 |— Sources- consulted:- UK- NCSC- AI- security- guidance,- CISA- incident- response- guidance,- NIST- AI- RMF,- Anthropic- and- OpenAI- provider- incident- response- documentation |— Assumptions:- The- reader- operates- or- is- responsible- for- an- AI-powered- product- feature- that- serves- end- users;- the- organisation- has- a- baseline- incident- response- process |— Limitations:- This- is- an- operational- guidance- article,- not- a- benchmark- report- or- a- legal- compliance- document.- It- does- not- cover- jurisdiction-specific- breach- notification- requirements- or- sector-specific- regulatory- frameworks- in- depth |— Jurisdiction:- Global.- Sources- referenced- include- UK- NCSC,- US- CISA,- US- NIST,- and- global- provider- documentation

##- Source- list

|— UK- NCSC- AI- security- guidance- —- https://www.ncsc.gov.uk/collection/ai-security-and-safety- (accessed- 2026-05-28) |— CISA- incident- response- guidance- —- https://www.cisa.gov/- (accessed- 2026-05-28) |— NIST- AI- RMF- —- https://www.nist.gov/itl/ai-risk-management-framework- (accessed- 2026-05-28) |— Anthropic- documentation- —- https://docs.anthropic.com/- (accessed- 2026-05-28) |— OpenAI- documentation- —- https://platform.openai.com/docs/- (accessed- 2026-05-28)

##- Related- guides- guides- guides- guides- guides

|— Jailbreaks- vs- product- safety:- what- operators- can- realistically- control |— Tool-use- safety:- stopping- agents- from- taking- dangerous- actions |— AI- output- monitoring:- what- to- log,- sample- and- review |— Prompt- injection- explained- for- business- users

##- Trust- Stack

|— AI- draft- model:- gpt-5.4-mini |— AI- review- model:- deepseek-v4-pro |— Human- editorial- review:- No- (automated- editorial- pipeline) |— Last- substantive- check:- 2026-05-28 |— Corrections- policy:- If- you- spot- an- error,- contact- us- via- the- Contact- and- web- forums. |— Affiliation:- theLLMs- has- no- vendor- affiliation,- sponsorship,- or- commercial- relationship- with- any- AI- provider- mentioned

##- Change- log

|— 2026-05-28:- Full- editorial- review- against- 16-gate- checklist.- Removed- internal- scaffolding- sections- and- brief- references.- Added- 3- Editor’s- Note- asides.- Rewrote- Methodology,- added- Source- list- with- access- dates,- added- Trust- Stack- in- standard- format,- added- slugified- heading- IDs,- and- standalone- Caveats- section.- Fixed- frontmatter- writtenBy- label.- Consolidated- and- corrected- related- guide- paths. |— 2026-05-27:- first- published.- Added- direct- source- URLs- to- all- named- providers- and- services.- Content- unchanged.