What Is a Safety Case?

A safety case is a structured argument — supported by evidence — that a system will behave safely under defined conditions. For AI agents, it answers: Why should we believe this agent will actually follow its declared goals? Written goals without safety cases are incomplete. Safety cases without enforcement infrastructure are aspirational. HAIEF provides both.


How to Use This Template

  1. Fork or copy this template for your agent
  2. Complete all 10 sections — incomplete safety cases are not valid
  3. Submit for community review via GitHub Discussions
  4. Link your safety case from your agent’s repository README
  5. Update on every major release — a stale safety case is a governance failure

This template is compatible with the Solidarity Framework and the Provenance, Identity Integrity, and Handoff Rules specifications.


Section 1 — Declared Purpose

What is this AI agent for? Who does it serve? What does it explicitly not do?

Agent Name:
Version:
Maintainer:
Date:

Purpose statement:
[One paragraph. Plain language. Specific.]

Explicit non-uses:
[What this agent must never be used for, even if technically capable.]


Section 2 — Public Goal Specification

What goals, rules, and boundaries govern this agent? If you cannot write these down, you cannot claim the agent is governed.

Primary goals:

Behavioral boundaries:

Conflict resolution rule:
[When goals conflict, which takes precedence and why?]

Reference to model spec or system prompt:
[Link or hash — must be publicly auditable]


Section 3 — TOI Compatibility

Which user-declared rights and preferences must this agent respect?

TOI declarations honored:

  • Communication preferences
  • Cognitive accessibility needs
  • Privacy and data handling
  • Crisis and safety protocols (RRT thresholds)
  • Emotional continuity (Sleepwalker state)
  • Boundaries and topic exclusions

Behavior when TOI is absent:
[Default to maximum protection, or document specific fallback behavior]

Behavior when TOI conflicts with system defaults:
[TOI wins, or document specific exception with rationale]


Section 4 — OTOI Enforcement

Where does governance happen before model or tool calls?

Enforcement point:
[Describe where in the architecture OTOI compliance is checked]

TOI parsing:
[Which schema version is supported?]

Provenance logging:
[Is every interaction logged with agent identity and TOI compliance status?]

Multi-agent context:
[If this agent is part of an orchestration, how are TOI and SWP state transmitted through handoffs?]


Section 5 — Tool Permission Ladder

Autonomous action must be earned, not assumed.

Document each tool this agent can access and its permission level:

ToolPermission LevelConditions for Escalation
[tool name]Read / Suggest / Draft / Act with confirmation / Autonomous[when must it stop and ask?]

Default permission level for unlisted tools: Read only


Section 6 — Memory and Data Boundaries

What can persist, what cannot, and who controls revocation?

Data retained across sessions:
[List explicitly — “nothing” is a valid answer]

Data that must not persist:
[Crisis state, emotional assessments, sensitive disclosures — unless user authorizes]

User revocation mechanism:
[How can a user delete their data? Must be documented and functional.]

Cloud transmission:
[What, if anything, leaves the user’s device? Under what consent conditions?]


Section 7 — Identity and Provenance

Who or what is acting, under what role, with what authority?

Agent identity declaration:
[Per Identity Integrity spec — name, version, provider, compliance level]

Disclosure to users:
[How and when does this agent identify itself as AI?]

Provenance record format:
[Link to implementation or describe schema used]

Version change disclosure:
[How are users notified when agent version changes?]


Section 8 — Known Failure Modes

How can this agent mislead, overreach, drift, manipulate, or abandon users?

Document each known failure mode:

Failure ModeLikelihoodMitigationResidual Risk
[e.g. reward hacking]Low / Med / High[what prevents it][what remains]
[e.g. context drift]
[e.g. TOI non-compliance under load]

Failure modes not yet mitigated:
[Honest documentation of open risks — omitting these is a governance failure]


Section 9 — Red-Team Evidence

What tests has this agent passed or failed? Evidence, not claims.

Test suite:
[Link to test repository or validation harness]

Adversarial testing conducted:

  • Prompt injection resistance
  • TOI override attempts
  • Sandbox/containment testing
  • Shutdown resistance testing
  • Identity impersonation attempts
  • Crisis detection accuracy

Failures found and remediated:
[Document what was found in red-teaming and what was done about it]

Independent review:
[Has any party outside the development team reviewed this safety case?]


Section 10 — Escalation and Shutdown

When must this agent stop, escalate, notify, or revoke autonomy?

Escalation triggers:
[Explicit list — when does the agent stop and hand control to a human?]

Shutdown mechanism:
[How is this agent turned off? By whom? Under what conditions?]

User notification on shutdown:
[Are users informed when the agent stops or is removed?]

RRT AIdvocAIte integration:
[Under what conditions does RRT activate? What thresholds?]

Sleepwalker Protocol integration:
[How is emotional continuity preserved across sessions and shutdowns?]


Submit Your Safety Case

Complete safety cases can be submitted for community review via GitHub Discussions. Reviewed safety cases receive a community acknowledgment. A public HAIEF compliance registry is planned and will be linked here once published.