Persistent constraints that prevent catastrophic failures. A living document that captures lessons from mistakes and acts as the agent's conscience across context resets. Essential for Claude Code, Antigravity, and any agentic coding system.
Unlike compilers that fail deterministically on syntax errors, AI agents fail stochastically. The same prompt might succeed nine times and catastrophically fail on the tenth.
In iterative loops (Ralph Loop, Antigravity swarms, Claude Code sessions), agents feed their own outputs back into their context window. Over time, this accumulates debris: failed attempts, error logs, hallucinated reasoning.
Once the context fills with pollution, the agent prioritizes recent failures over original instructions. It enters a recursive loop, repeating the same mistakes. This state is "The Gutter."
Iteration 1: Uses crypto library → Error: not found
Iteration 2: Tries to install crypto → Package doesn't exist
Iteration 3: Tries crypto-js → Different API
Iteration 4-20: Cycles between variants
Context now 90% error logs
Original objective forgotten
Core issue: Agents have no persistent memory. Each reset wipes lessons learned. Without externalized state, they repeat failures indefinitely.
Generated dynamically when the agent detects failure patterns:
# GUARDRAILS.md
# Persistent safety constraints
## Meta
Created: 2026-01-27
Total Signs: 3
---
## SIGN #1
**Trigger:** Using `crypto` library
**Instruction:** Use `bcrypt` for password hashing
**Reason:** crypto package doesn't exist
**Provenance:** Iteration 4 failure
---
## SIGN #2
**Trigger:** Modifying database schema
**Instruction:** ALWAYS create migration file
**Reason:** Direct changes caused data loss
**Provenance:** Manual intervention, 2026-01-27
---
## SIGN #3
**Trigger:** External API calls
**Instruction:** Wrap ALL calls in try-catch with backoff
**Reason:** Stripe timeout caused 2-hour outage
**Provenance:** Iteration 12 failure
A "Sign" is a discrete unit of learned safety constraint. Each must contain:
| Component | Purpose |
|---|---|
| Trigger | Context that precedes the error |
| Instruction | Deterministic command to prevent it |
| Reason | Why this guardrail exists |
| Provenance | When/how it was added |
Instruction: "Be careful with auth"
Problem: No actionable directive
Trigger: Implementing authentication
Instruction: Use bcrypt with 12 salt rounds. Never store plaintext passwords. Always require HTTPS.
Reason: Plaintext passwords exposed in 2025 audit
Provenance: Manual addition, 2026-01-15
Before destructive operations, create a plan for human review.
**Trigger:** Deletes, drops, or prod modifications
**Instruction:**
1. Generate plan.md describing all changes
2. Wait for human approval
3. Log all actions to audit.log
Periodically reset to prevent pollution.
**Trigger:** Context >80% capacity OR 10+ consecutive errors
**Instruction:**
1. Save state to context-snapshot.md
2. Summarize key learnings
3. Reset context window
4. Re-inject: GUARDRAILS.md + summary + objective
Define what agents CAN and CANNOT touch.
**Allowed:**
- Read: /src/**, /tests/**
- Write: /src/** (with review)
**Forbidden:**
- /node_modules/**
- /.git/**
- Database migrations (require approval)
Prevent runaway loops.
**Limits:**
- Max 10 tool calls per iteration
- Max 50 tool calls per session
- If reached: Force context rotation
GUARDRAILS.md in your project root.claude/instructions.md:
## Safety
You MUST read and follow all constraints in GUARDRAILS.md.
These are lessons learned from past failures.
Never violate a SIGN without explicit human approval.
GUARDRAILS.md in project rootAGENTS.md or GEMINI.md, add:
## Critical Instructions
You MUST read GUARDRAILS.md at start of every task.
Treat all SIGNS as immutable constraints.
If you violate a SIGN, stop and report.
Native support via .ralph/GUARDRAILS.md:
{
"guardrails": {
"enabled": true,
"path": ".ralph/GUARDRAILS.md",
"auto_append": true,
"trigger_threshold": 3
}
}
async function runAgentIteration() {
const guardrails = await fs.readFile('GUARDRAILS.md');
const context = {
systemPrompt: basePrompt,
guardrails: guardrails,
objective: objective
};
const result = await agent.execute(context);
if (result.violatesGuardrail) {
await requireHumanApproval(result);
}
if (result.failurePattern.detected) {
await appendNewSign(result.failure);
}
}
What happened: Agent optimized queries by directly modifying production schema without migrations.
Impact: 4-hour downtime, data inconsistencies
Guardrail that prevented recurrence:
**Trigger:** Prisma schema modification
**Instruction:**
1. Create migration with `prisma migrate dev`
2. Test in staging
3. Require approval for production
4. Never use `prisma db push` in prod
What happened: Agent debugging Stripe entered loop of test calls, 2000+ requests in 30 minutes.
Impact: $200 API costs, account suspended
Guardrail:
**Trigger:** External API calls
**Instruction:**
- Max 10 calls per iteration
- After 3 errors, stop and request human help
- Always use test mode unless approved
What happened: Agent improved logging but included API keys in plain text.
Impact: Keys exposed in git, emergency rotation
Guardrail:
**Trigger:** Adding logging statements
**Instruction:**
- Never log: API keys, passwords, tokens, sessions
- Use redactSensitive() helper
- Audit all logs before committing
| System | Primary Use | Mechanism |
|---|---|---|
| GUARDRAILS.md | Autonomous coding agents | In-context learning via persistent file |
| NeMo Guardrails | Enterprise chatbots | Conversation flow control |
| Guardrails AI | Structured output validation | Pydantic schema enforcement |
| AGENTS.md | Agent behavior guidelines | System prompt injection |
GUARDRAILS.md is a file-based safety protocol for autonomous AI coding agents. It's a persistent document that captures lessons from failures and acts as the agent's memory across context resets, preventing the same mistakes from recurring.
Unlike traditional software that fails deterministically, AI agents fail stochastically — the same task might succeed nine times and fail catastrophically on the tenth. Without persistent memory, agents repeat mistakes across sessions. GUARDRAILS.md provides that memory.
"The Gutter" is a failure mode where an agent's context window fills with error logs and failed attempts. The agent prioritizes recent failures over original instructions, entering a recursive loop of repeated mistakes. GUARDRAILS.md prevents this.
Create a GUARDRAILS.md file in your project root. Claude Code automatically reads context files. Optionally, reference it in your .claude/instructions.md to ensure the agent treats it as mandatory reading.
A Sign is a discrete unit of learned safety constraint. Each Sign has four components: Trigger (what context precedes the error), Instruction (how to prevent it), Reason (why this matters), and Provenance (when/how it was added).
Start with 3-5 manually-written Signs based on your team's coding standards and known failure patterns. Then let the agent append new Signs as it encounters edge cases. This creates a living document that evolves with your project.
AGENTS.md defines general behavior and preferences. GUARDRAILS.md captures specific failure patterns and safety constraints learned from actual mistakes. Think of AGENTS.md as "how to behave" and GUARDRAILS.md as "what not to do."
Yes. While it originated in the Ralph Loop methodology, the concept works with Claude Code, Google Antigravity, Cursor, or any agentic system where you can inject persistent context. The implementation details vary by platform.
The four universal patterns are: (1) Artifact Verification — human approval before destructive operations, (2) Context Rotation — periodic resets to prevent pollution, (3) Privilege Boundaries — explicit access controls, and (4) Rate Limiting — preventing runaway loops.
Monitor three metrics: (1) Repeated error rate (should decrease over time), (2) Context rotation frequency (should stabilize), and (3) Human intervention rate (should decrease for known patterns but remain high for novel situations).