Files
openclaw-backups/skills/openclaw-self-healing/memory/reddit-post-draft.md

45 lines
1.3 KiB
Markdown

# Reddit r/selfhosted 게시문 초안
## Title
[Project] Self-Healing System for AI Agents - Uses Claude Code as Emergency Doctor
## Body
**TL;DR:** Built a 4-tier watchdog system that can autonomously diagnose and fix crashes using AI.
---
## The Problem
Running AI agents (OpenClaw, similar to n8n but for AI) means dealing with random crashes at 3am. Traditional watchdogs just restart - they don't understand *why* things broke.
## The Solution
4-tier escalation system:
| Level | Trigger | Action |
|-------|---------|--------|
| 1 | Process dead | Restart (180s) |
| 2 | HTTP unhealthy | Retry 3x, then escalate (300s) |
| 3 | L2 failed | Launch Claude Code in tmux, diagnose, fix (30min) |
| 4 | L3 failed | Discord alert to human |
## The Interesting Part
Level 3 launches Claude Code (Anthropic's CLI) in a tmux PTY session. It:
- Reads system logs
- Analyzes error patterns
- Identifies root cause
- Executes fixes autonomously
- Reports results
All logged. All auditable. 30-minute timeout for human intervention.
## Security
- Isolated tmux session
- Cannot access main credentials
- Read-only config access
- Cleanup trap prevents orphans
## Links
- GitHub: https://github.com/Ramsbaby/openclaw-self-healing
- Demo GIF in README
Would love feedback from the self-hosted community. Anyone else running AI agents 24/7?