# GUARDRAILS.md Protocol > The safety protocol for autonomous AI coding agents. Persistent constraints that prevent catastrophic failures. ## Overview GUARDRAILS.md is a file-based protocol for creating persistent safety constraints in agentic development loops. It solves the fundamental problem of agents having no memory across sessions by externalizing learned constraints into a file that must be read at initialization. Essential for Claude Code, Google Antigravity, Ralph Loop, and any autonomous coding system. ## The Problem: Stochastic Failures Unlike traditional software that fails deterministically, AI agents fail stochastically - the same task might succeed nine times and fail catastrophically on the tenth. Without persistent memory, agents repeat the same mistakes across sessions. ### "The Gutter" When an agent's context window fills with error logs and failed attempts, it prioritizes recent failures over original instructions, entering a recursive loop of repeated mistakes. This is called "The Gutter." Example: Agent tries to use `crypto` library → Error → Tries to install → Doesn't exist → Tries `crypto-js` → Different API → Cycles between variants → Context now 90% error logs → Original objective forgotten. ## The Solution: Persistent State GUARDRAILS.md provides: - **Persistent State**: Lessons learned survive context resets - **"Signs" Architecture**: Discrete units of learned safety constraints - **Context Pollution Prevention**: Escape mechanism for "Gutter" states - **Audit Trail**: Human-readable record of failures and fixes ## Signs Architecture A "Sign" is a discrete unit of learned safety constraint with four components: 1. **Trigger**: Context that precedes the error (e.g., "Modifying database schema") 2. **Instruction**: Deterministic command to prevent it (e.g., "ALWAYS create migration file") 3. **Reason**: Why this guardrail exists (e.g., "Direct changes caused data loss") 4. **Provenance**: When/how it was added (e.g., "Manual intervention, 2026-01-27") ### Good Sign Example ``` **Trigger:** Implementing authentication endpoints **Instruction:** Use bcrypt with 12 salt rounds. Never store plaintext passwords. Always require HTTPS. **Reason:** Plaintext passwords exposed in 2025 audit **Provenance:** Manual addition, 2026-01-15 ``` ## Universal Safety Patterns ### 1. Artifact Verification Before destructive operations, create a plan for human review. Wait for explicit approval before proceeding. ### 2. Context Rotation When context exceeds 80% capacity or 10+ consecutive errors: 1. Save state to context-snapshot.md 2. Summarize key learnings 3. Reset context window 4. Re-inject: GUARDRAILS.md + summary + objective ### 3. Privilege Boundaries Explicitly define what agents CAN and CANNOT touch: - Allowed: Read /src/**, Write /src/** (with review) - Forbidden: /node_modules/**, /.git/**, Database migrations (require approval) ### 4. Rate Limiting Prevent runaway loops: - Max 10 tool calls per iteration - Max 50 tool calls per session - If reached: Force context rotation ## Implementation ### For Claude Code 1. Create `GUARDRAILS.md` in project root 2. Claude Code automatically reads it 3. Optionally reference in `.claude/instructions.md`: ``` You MUST read and follow all constraints in GUARDRAILS.md. Never violate a SIGN without explicit human approval. ``` ### For Google Antigravity 1. Create `GUARDRAILS.md` in project root 2. Reference in `AGENTS.md` or `GEMINI.md` 3. Configure agent approval for violations ### For Ralph Loop Native support via `.ralph/GUARDRAILS.md`: ```json { "guardrails": { "enabled": true, "path": ".ralph/GUARDRAILS.md", "auto_append": true, "trigger_threshold": 3 } } ``` ## Real-World Case Studies ### The Database Migration Disaster **What happened:** Agent optimized queries by directly modifying production schema without migrations. **Impact:** 4-hour downtime, data inconsistencies **Guardrail:** Always create migration with `prisma migrate dev`, test in staging, require approval for production ### The Infinite API Loop **What happened:** Agent debugging Stripe entered loop of test calls, 2000+ requests in 30 minutes. **Impact:** $200 API costs, account suspended **Guardrail:** Max 10 API calls per iteration, stop after 3 consecutive errors, always use test mode ### The Credential Leak **What happened:** Agent improved logging but included API keys in plain text. **Impact:** Keys exposed in git, emergency rotation **Guardrail:** Never log API keys, passwords, tokens, sessions; use redactSensitive() helper ## Getting Started Start with 3-5 manually-written Signs based on your team's coding standards and known failure patterns. Let the agent append new Signs as it encounters edge cases. This creates a living document that evolves with your project. ## Success Metrics Monitor three metrics: 1. **Repeated error rate** - Should decrease over time 2. **Context rotation frequency** - Should stabilize 3. **Human intervention rate** - Should decrease for known patterns but remain high for novel situations ## Related Resources - Antigravity guide: https://antigravity.md - Claude Code: https://code.anthropic.com/ - AGENTS.md spec: https://agents.md - Ralph Loop implementation: https://github.com/agrimsingh/ralph-wiggum-cursor - OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/ --- Maintained by NMA.vc | https://nma.vc