AI Newsletter Digest improvements: fixed QP soft line break decoding, URL extraction, and content cleaning

2026-03-04 13:29:22 +00:00
parent 29a98137a7
commit 57dd294675
13706 changed files with 2114953 additions and 237629 deletions
--- a/skills/openclaw-self-healing/references/LINUX_SETUP.md
+++ b/skills/openclaw-self-healing/references/LINUX_SETUP.md
@@ -0,0 +1,99 @@
+# Linux Setup Guide (systemd)
+
+> ⚠️ **Work in Progress** - This is a community contribution template. Full Linux support is on the roadmap.
+
+## Overview
+
+This guide provides systemd equivalents for the macOS LaunchAgent-based self-healing system.
+
+## Prerequisites
+
+- Linux (Ubuntu 20.04+, Debian 11+, or similar)
+- systemd
+- OpenClaw Gateway installed
+- tmux (`apt install tmux`)
+- Claude CLI (`npm install -g @anthropic-ai/claude-code`)
+
+## Level 1: Watchdog (systemd)
+
+Create `/etc/systemd/system/openclaw-gateway.service`:
+
+```ini
+[Unit]
+Description=OpenClaw Gateway
+After=network.target
+
+[Service]
+Type=simple
+User=YOUR_USER
+WorkingDirectory=/home/YOUR_USER
+ExecStart=/usr/local/bin/openclaw gateway start
+Restart=always
+RestartSec=180
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start:
+```bash
+sudo systemctl enable openclaw-gateway
+sudo systemctl start openclaw-gateway
+```
+
+## Level 2: Health Check (systemd timer)
+
+Create `/etc/systemd/system/openclaw-healthcheck.service`:
+
+```ini
+[Unit]
+Description=OpenClaw Health Check
+
+[Service]
+Type=oneshot
+User=YOUR_USER
+ExecStart=/home/YOUR_USER/openclaw/scripts/gateway-healthcheck.sh
+```
+
+Create `/etc/systemd/system/openclaw-healthcheck.timer`:
+
+```ini
+[Unit]
+Description=Run OpenClaw Health Check every 5 minutes
+
+[Timer]
+OnBootSec=5min
+OnUnitActiveSec=5min
+
+[Install]
+WantedBy=timers.target
+```
+
+Enable:
+```bash
+sudo systemctl enable openclaw-healthcheck.timer
+sudo systemctl start openclaw-healthcheck.timer
+```
+
+## Level 3 & 4
+
+Scripts work the same on Linux. Update paths in `.env`:
+
+```bash
+OPENCLAW_DIR=/home/YOUR_USER/openclaw
+LOG_DIR=/home/YOUR_USER/openclaw/memory
+```
+
+## Script Modifications
+
+Replace macOS-specific commands:
+
+| macOS | Linux |
+|-------|-------|
+| `launchctl` | `systemctl` |
+| `~/Library/LaunchAgents/` | `/etc/systemd/system/` |
+| `open` | `xdg-open` |
+
+## Contributing
+
+Help us improve Linux support! See [CONTRIBUTING.md](/CONTRIBUTING.md).
--- a/skills/openclaw-self-healing/references/self-healing-TROUBLESHOOTING.md
+++ b/skills/openclaw-self-healing/references/self-healing-TROUBLESHOOTING.md
@@ -0,0 +1,584 @@
+# Troubleshooting Guide
+
+> **Common issues and solutions for OpenClaw Self-Healing System**
+
+---
+
+## 🔍 Diagnostic Commands
+
+Before diving into specific issues, run these diagnostic commands:
+
+```bash
+# 1. Check LaunchAgent status
+launchctl list | grep openclaw
+
+# 2. Check Health Check logs
+tail -50 ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log
+
+# 3. Check Emergency Recovery logs
+ls -lt ~/openclaw/memory/emergency-recovery-*.log | head -5
+
+# 4. Check Gateway status
+openclaw status
+
+# 5. Check cron jobs
+openclaw cron list | grep -i "emergency\|health"
+
+# 6. Check script permissions
+ls -lh ~/openclaw/scripts/*.sh
+```
+
+---
+
+## 🚨 Level 1: Watchdog Issues
+
+### Issue: Watchdog not restarting Gateway
+
+**Symptoms:**
+- Gateway crashes but doesn't restart
+- No automatic recovery after 3 minutes
+
+**Diagnosis:**
+```bash
+# Check if Watchdog LaunchAgent is loaded
+launchctl list | grep openclaw.watchdog
+
+# Expected output:
+# -    0    ai.openclaw.watchdog
+```
+
+**Solution 1: Watchdog not loaded**
+```bash
+# Check if plist exists
+ls ~/Library/LaunchAgents/ai.openclaw.watchdog.plist
+
+# If missing, reinstall OpenClaw:
+npm install -g openclaw
+openclaw onboard --install-daemon
+```
+
+**Solution 2: Watchdog disabled**
+```bash
+# Reload Watchdog
+launchctl unload ~/Library/LaunchAgents/ai.openclaw.watchdog.plist
+launchctl load ~/Library/LaunchAgents/ai.openclaw.watchdog.plist
+```
+
+---
+
+## 🏥 Level 2: Health Check Issues
+
+### Issue: Health Check not running
+
+**Symptoms:**
+- No `healthcheck-*.log` files in `~/openclaw/memory/`
+- LaunchAgent listed but no activity
+
+**Diagnosis:**
+```bash
+# Check LaunchAgent status
+launchctl list | grep openclaw.healthcheck
+
+# Check LaunchAgent logs
+tail -f ~/Library/Logs/com.openclaw.healthcheck.log
+```
+
+**Solution 1: LaunchAgent not loaded**
+```bash
+launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+```
+
+**Solution 2: Script path wrong**
+```bash
+# Check plist file
+cat ~/Library/LaunchAgents/com.openclaw.healthcheck.plist | grep ProgramArguments -A 2
+
+# Should point to: ~/openclaw/scripts/gateway-healthcheck.sh
+# If wrong, edit plist:
+nano ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+
+# Reload after edit:
+launchctl unload ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+```
+
+**Solution 3: Script not executable**
+```bash
+chmod +x ~/openclaw/scripts/gateway-healthcheck.sh
+```
+
+**Solution 4: Run manually to test**
+```bash
+bash ~/openclaw/scripts/gateway-healthcheck.sh
+
+# Check for errors in output
+```
+
+---
+
+### Issue: Health Check false positives
+
+**Symptoms:**
+- Health Check reports failure but Gateway is running fine
+- Unnecessary restarts
+
+**Diagnosis:**
+```bash
+# Check Gateway URL
+curl -I http://localhost:18789/
+
+# Check environment variable
+source ~/.openclaw/.env
+echo $OPENCLAW_GATEWAY_URL
+```
+
+**Solution: Wrong Gateway URL**
+```bash
+# Edit .env
+nano ~/.openclaw/.env
+
+# Set correct URL:
+OPENCLAW_GATEWAY_URL="http://localhost:18789/"
+
+# Reload LaunchAgent
+launchctl unload ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+```
+
+---
+
+### Issue: Health Check restarts too aggressively
+
+**Symptoms:**
+- Gateway restarts multiple times per hour
+- Unstable system
+
+**Diagnosis:**
+```bash
+# Check retry settings
+source ~/.openclaw/.env
+echo "Max retries: ${HEALTH_CHECK_MAX_RETRIES:-3}"
+echo "Retry delay: ${HEALTH_CHECK_RETRY_DELAY:-30}s"
+```
+
+**Solution: Increase thresholds**
+```bash
+# Edit .env
+nano ~/.openclaw/.env
+
+# Add/modify:
+HEALTH_CHECK_MAX_RETRIES=5
+HEALTH_CHECK_RETRY_DELAY=60
+HEALTH_CHECK_ESCALATION_WAIT=600
+
+# Reload LaunchAgent
+launchctl unload ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+```
+
+---
+
+## 🧠 Level 3: Claude Recovery Issues
+
+### Issue: Claude CLI not found
+
+**Symptoms:**
+- Emergency Recovery logs show: `❌ Missing dependencies: claude`
+- Level 3 skips to Level 4
+
+**Diagnosis:**
+```bash
+# Check Claude installation
+which claude
+claude --version
+```
+
+**Solution: Install Claude CLI**
+```bash
+npm install -g @anthropic-ai/claude-code
+
+# Verify
+claude --version
+```
+
+---
+
+### Issue: Claude session fails to start
+
+**Symptoms:**
+- Emergency Recovery logs show: `Starting Claude Code session...`
+- Then: `⚠️ Claude workspace trust prompt not detected`
+
+**Diagnosis:**
+```bash
+# Check tmux
+which tmux
+tmux -V
+
+# Test Claude manually
+claude
+# Does it prompt "trust this workspace"?
+```
+
+**Solution 1: tmux not installed**
+```bash
+brew install tmux
+```
+
+**Solution 2: Claude workspace already trusted**
+```bash
+# This is actually OK — script proceeds anyway
+# Check recovery logs for actual failure reason
+tail -50 ~/openclaw/memory/emergency-recovery-*.log
+```
+
+---
+
+### Issue: Claude API quota exceeded
+
+**Symptoms:**
+- Emergency Recovery logs show: `⚠️ Claude API rate limited or quota exceeded`
+- Level 3 fails immediately
+
+**Diagnosis:**
+```bash
+# Check Claude usage
+claude
+# Type: /usage
+# Check remaining quota
+```
+
+**Solution: Wait for quota reset**
+```bash
+# Claude API resets every 5 hours
+# Check exact reset time in Claude CLI: /usage
+
+# Meanwhile, system escalates to Level 4 (human alert)
+```
+
+**Workaround: Increase timeout for next attempt**
+```bash
+# Edit .env
+nano ~/.openclaw/.env
+
+# Increase timeout:
+EMERGENCY_RECOVERY_TIMEOUT=3600  # 1 hour instead of 30 min
+```
+
+---
+
+### Issue: Claude recovery times out
+
+**Symptoms:**
+- Emergency Recovery runs for 30 minutes
+- Gateway still unhealthy
+- No clear failure reason in logs
+
+**Diagnosis:**
+```bash
+# Check Claude session log
+tail -200 ~/openclaw/memory/claude-session-*.log
+
+# Look for:
+# - Errors executing commands
+# - Stuck waiting for input
+# - Network issues
+```
+
+**Solution 1: Increase timeout**
+```bash
+# Edit .env
+nano ~/.openclaw/.env
+
+# Increase timeout:
+EMERGENCY_RECOVERY_TIMEOUT=3600  # 1 hour
+```
+
+**Solution 2: Check manual recovery**
+```bash
+# What would you do manually?
+openclaw status
+tail -100 ~/.openclaw/logs/gateway.log
+
+# Apply the fix yourself, then analyze why Claude couldn't
+```
+
+---
+
+## 🚨 Level 4: Discord Notification Issues
+
+### Issue: No Discord notifications
+
+**Symptoms:**
+- Level 4 should trigger but no messages in Discord
+- Emergency Recovery Monitor cron runs but silent
+
+**Diagnosis:**
+```bash
+# Check webhook URL
+source ~/.openclaw/.env
+echo $DISCORD_WEBHOOK_URL
+
+# Test webhook manually
+curl -X POST "$DISCORD_WEBHOOK_URL" \
+  -H "Content-Type: application/json" \
+  -d '{"content": "Test notification"}'
+```
+
+**Solution 1: Webhook URL not set**
+```bash
+# Edit .env
+nano ~/.openclaw/.env
+
+# Add:
+DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/YOUR_ID/YOUR_TOKEN"
+```
+
+**Solution 2: Webhook URL invalid**
+```bash
+# Get new webhook from Discord:
+# Server Settings > Integrations > Webhooks > New Webhook
+
+# Copy URL and update .env
+nano ~/.openclaw/.env
+```
+
+**Solution 3: Network issues**
+```bash
+# Test internet connectivity
+ping -c 3 discord.com
+
+# Test DNS resolution
+nslookup discord.com
+
+# If behind proxy, check proxy settings
+```
+
+---
+
+### Issue: Duplicate Discord notifications
+
+**Symptoms:**
+- Same alert sent multiple times
+- Alert flood in Discord channel
+
+**Diagnosis:**
+```bash
+# Check alert tracking file
+cat ~/openclaw/memory/.emergency-alert-sent
+
+# Check Monitor cron frequency
+openclaw cron list | grep "Emergency Recovery"
+```
+
+**Solution: Alert file corrupted**
+```bash
+# Remove alert tracking file
+rm ~/openclaw/memory/.emergency-alert-sent
+
+# Next alert will reset tracking
+```
+
+---
+
+## 🔧 General Issues
+
+### Issue: Logs filling up disk
+
+**Symptoms:**
+- `~/openclaw/memory/` grows to GB
+- Old logs not deleted
+
+**Diagnosis:**
+```bash
+# Check disk usage
+du -sh ~/openclaw/memory/
+
+# Count log files
+ls ~/openclaw/memory/*.log | wc -l
+```
+
+**Solution: Manual cleanup**
+```bash
+# Delete logs older than 14 days
+find ~/openclaw/memory -name "healthcheck-*.log" -mtime +14 -delete
+find ~/openclaw/memory -name "emergency-recovery-*.log" -mtime +14 -delete
+find ~/openclaw/memory -name "claude-session-*.log" -mtime +14 -delete
+```
+
+**Prevention: Add cleanup cron**
+```bash
+openclaw cron add \
+  --name "Log Rotation (Self-Healing)" \
+  --schedule '0 3 * * *' \
+  --command 'find ~/openclaw/memory -name "*healthcheck*.log" -o -name "*emergency-recovery*.log" -o -name "*claude-session*.log" -mtime +14 -delete' \
+  --session isolated
+```
+
+---
+
+### Issue: Scripts fail with "Permission denied"
+
+**Symptoms:**
+- LaunchAgent logs show: `Permission denied: gateway-healthcheck.sh`
+
+**Solution:**
+```bash
+chmod +x ~/openclaw/scripts/*.sh
+```
+
+---
+
+### Issue: Environment variables not loading
+
+**Symptoms:**
+- Scripts use default values instead of custom `.env` settings
+
+**Diagnosis:**
+```bash
+# Check .env exists
+ls -lh ~/.openclaw/.env
+
+# Check .env syntax
+cat ~/.openclaw/.env | grep -v '^#' | grep '='
+```
+
+**Solution: Fix .env syntax**
+```bash
+# Edit .env
+nano ~/.openclaw/.env
+
+# Correct format:
+# KEY="value"  ✅
+# KEY='value'  ✅
+# KEY=value    ✅
+#
+# KEY = "value"  ❌ (spaces around =)
+# KEY="value   ❌ (missing closing quote)
+
+# Reload LaunchAgent after fixing
+launchctl unload ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+```
+
+---
+
+## 🧪 Testing & Validation
+
+### Force trigger each level
+
+#### Test Level 1: Watchdog
+```bash
+kill -9 $(pgrep -f openclaw-gateway)
+sleep 180
+curl http://localhost:18789/
+```
+
+#### Test Level 2: Health Check
+```bash
+# Stop Gateway
+openclaw gateway stop
+
+# Wait for Health Check (5 min max)
+tail -f ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log
+
+# Should see restart attempts
+```
+
+#### Test Level 3: Claude Recovery
+```bash
+# Inject config error (backup first!)
+cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak
+
+# Break config (e.g., change port to invalid value)
+# Then restart Gateway and wait ~8 min
+
+# Watch Level 3 trigger
+tail -f ~/openclaw/memory/emergency-recovery-*.log
+```
+
+#### Test Level 4: Discord Alert
+```bash
+# Simulate Level 3 failure
+cat > ~/openclaw/memory/emergency-recovery-test-$(date +%Y-%m-%d-%H%M).log << 'EOF'
+[2026-02-06 20:00:00] === Emergency Recovery Started ===
+[2026-02-06 20:30:00] Gateway still unhealthy (HTTP 500)
+
+=== MANUAL INTERVENTION REQUIRED ===
+Level 1 (Watchdog) ❌
+Level 2 (Health Check) ❌
+Level 3 (Claude Recovery) ❌
+EOF
+
+# Run monitor
+bash ~/openclaw/scripts/emergency-recovery-monitor.sh
+
+# Check Discord for alert
+```
+
+---
+
+## 📚 Advanced Troubleshooting
+
+### Enable debug logging
+
+Add to scripts (temporary):
+
+```bash
+# In gateway-healthcheck.sh
+set -x  # Enable bash debug mode
+
+# View verbose output
+tail -f ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log
+```
+
+### Check macOS system logs
+
+```bash
+# Filter for openclaw-related errors
+log show --predicate 'process == "launchd" AND eventMessage CONTAINS "openclaw"' --last 1h
+
+# Check LaunchAgent errors
+log show --predicate 'subsystem == "com.apple.launchd"' --last 1h
+```
+
+### Verify Gateway port
+
+```bash
+# Check what's listening on 18789
+lsof -i :18789
+
+# Expected: openclaw-gateway process
+```
+
+### Check for port conflicts
+
+```bash
+# Find processes using common ports
+lsof -i :18789
+lsof -i :8080
+lsof -i :3000
+
+# If conflict, change Gateway port in ~/.openclaw/openclaw.json
+```
+
+---
+
+## 🆘 Still Stuck?
+
+### Get help from the community
+
+1. **GitHub Issues:** [github.com/ramsbaby/openclaw-self-healing/issues](https://github.com/ramsbaby/openclaw-self-healing/issues)
+2. **OpenClaw Discord:** [discord.com/invite/clawd](https://discord.com/invite/clawd)
+3. **Include in your report:**
+   - macOS version: `sw_vers`
+   - OpenClaw version: `openclaw version`
+   - Self-Healing logs: Last 50 lines of `healthcheck-*.log` and `emergency-recovery-*.log`
+   - Script versions: `head -5 ~/openclaw/scripts/*.sh`
+
+---
+
+<p align="center">
+  <strong>Most issues are config or permissions related.</strong><br>
+  When in doubt, check <code>.env</code> and re-run <code>chmod +x</code>.
+</p>
--- a/skills/openclaw-self-healing/references/self-healing-system.md
+++ b/skills/openclaw-self-healing/references/self-healing-system.md
@@ -0,0 +1,414 @@
+# OpenClaw Self-Healing System
+
+> "시스템이 스스로를 치료하지 못하면 외부 의사를 부른다" — 메타 레벨 자가복구
+
+## 개요
+
+OpenClaw Gateway는 4단계 자가복구(Self-Healing) 시스템으로 장애 상황에서 자동 복구를 시도합니다.
+
+**설계 철학:**
+- Level 1-2: 빠른 자동 복구 (초 단위)
+- Level 3: 지능형 진단 및 복구 (분 단위)
+- Level 4: 인간 개입 요청 (알림)
+
+---
+
+## 아키텍처
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ Level 1: Watchdog (180초 간격)                           │
+│ ├─ LaunchAgent: ai.openclaw.watchdog                    │
+│ └─ 프로세스 존재 체크 → 재시작                          │
+└─────────────────────────────────────────────────────────┘
+                         ↓ (프로세스는 살아있지만 먹통)
+┌─────────────────────────────────────────────────────────┐
+│ Level 2: Health Check (300초 간격)                       │
+│ ├─ Script: gateway-healthcheck.sh                       │
+│ ├─ LaunchAgent: com.openclaw.healthcheck                │
+│ ├─ HTTP 200 응답 검증                                    │
+│ ├─ 실패 시 3회 재시도 (30초 간격)                       │
+│ └─ 여전히 실패 → Level 3 escalation                     │
+└─────────────────────────────────────────────────────────┘
+                         ↓ (5분간 복구 실패)
+┌─────────────────────────────────────────────────────────┐
+│ Level 3: Claude Emergency Recovery (30분 타임아웃)       │
+│ ├─ Script: emergency-recovery.sh                        │
+│ ├─ tmux로 Claude Code PTY 세션 시작                     │
+│ ├─ 자동 진단:                                            │
+│ │   - openclaw status                                   │
+│ │   - 로그 분석 (~/.openclaw/logs/*.log)               │
+│ │   - 설정 검증 (openclaw.json)                        │
+│ │   - 포트 충돌 체크 (lsof -i :18789)                  │
+│ │   - 의존성 체크 (npm list, node --version)          │
+│ ├─ 복구 시도 (설정 수정, 프로세스 재시작)               │
+│ ├─ 복구 리포트 생성:                                     │
+│ │   - memory/emergency-recovery-report-*.md            │
+│ │   - memory/claude-session-*.log                      │
+│ └─ 성공/실패 판정 (HTTP 200 체크)                       │
+└─────────────────────────────────────────────────────────┘
+                         ↓ (Claude 복구도 실패)
+┌─────────────────────────────────────────────────────────┐
+│ Level 4: Discord Notification (300초 간격 모니터링)      │
+│ ├─ Script: emergency-recovery-monitor.sh                │
+│ ├─ Cron: eddd4e18-b995-4420-8465-7c6927280228           │
+│ ├─ 최근 30분 emergency-recovery 로그 감시               │
+│ ├─ "MANUAL INTERVENTION REQUIRED" 패턴 검색             │
+│ └─ #jarvis-health 채널에 알림 전송                      │
+└─────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 구성 요소
+
+### Level 1: Watchdog
+
+**파일:**
+- `~/Library/LaunchAgents/ai.openclaw.watchdog.plist`
+
+**동작:**
+- 180초마다 OpenClaw 프로세스 존재 확인
+- 프로세스 없으면 자동 재시작
+
+**한계:**
+- 프로세스는 살아있지만 HTTP 응답 못하는 경우 감지 불가
+
+---
+
+### Level 2: Health Check
+
+**파일:**
+- `~/openclaw/scripts/gateway-healthcheck.sh`
+- `~/Library/LaunchAgents/com.openclaw.healthcheck.plist`
+
+**동작:**
+1. HTTP GET `http://localhost:18789/` → 200 체크
+2. 실패 시 재시작 (30초 대기)
+3. 3회 재시도
+4. 여전히 실패 → 5분 대기
+5. 5분 후에도 실패 → Level 3 트리거
+
+**로그:**
+- `~/openclaw/memory/healthcheck-YYYY-MM-DD.log`
+
+**설치:**
+```bash
+launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+```
+
+**제거:**
+```bash
+launchctl unload ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+```
+
+---
+
+### Level 3: Claude Emergency Recovery
+
+**파일:**
+- `~/openclaw/scripts/emergency-recovery.sh`
+
+**동작:**
+1. tmux 세션 생성: `emergency_recovery_TIMESTAMP`
+2. Claude Code 실행 (`claude`)
+3. 워크스페이스 신뢰 (자동 Enter)
+4. 긴급 복구 명령 전송:
+   ```
+   OpenClaw 게이트웨이가 5분간 재시작했으나 복구되지 않았습니다.
+   긴급 진단 및 복구를 시작하세요.
+   
+   작업 순서:
+   1. openclaw status 체크
+   2. 로그 분석 (~/.openclaw/logs/*.log)
+   3. 설정 검증 (~/.openclaw/openclaw.json)
+   4. 포트 충돌 체크 (lsof -i :18789)
+   5. 의존성 체크 (npm list, node --version)
+   6. 복구 시도 (설정 수정, 프로세스 재시작)
+   7. 결과를 memory/emergency-recovery-report-*.md 에 기록
+   ```
+5. 30분 대기
+6. 복구 결과 확인 (HTTP 200 체크)
+7. tmux 세션 캡처 및 종료
+
+**출력 파일:**
+- `~/openclaw/memory/emergency-recovery-TIMESTAMP.log` (실행 로그)
+- `~/openclaw/memory/claude-session-TIMESTAMP.log` (Claude 세션 캡처)
+- `~/openclaw/memory/emergency-recovery-report-TIMESTAMP.md` (Claude 생성, 옵션)
+
+**수동 실행:**
+```bash
+~/openclaw/scripts/emergency-recovery.sh
+```
+
+---
+
+### Level 4: Discord Notification
+
+**파일:**
+- `~/openclaw/scripts/emergency-recovery-monitor.sh`
+
+**동작:**
+1. 최근 30분 내 `emergency-recovery-*.log` 파일 검색
+2. "MANUAL INTERVENTION REQUIRED" 패턴 검색
+3. 발견 시 #jarvis-health 채널에 알림
+4. 중복 알림 방지 (`.emergency-alert-sent` 파일)
+
+**Cron 설정:**
+- **ID:** `eddd4e18-b995-4420-8465-7c6927280228`
+- **주기:** 5분 (`everyMs: 300000`)
+- **세션:** isolated
+- **모델:** claude-haiku-4-5
+- **채널:** Discord #jarvis-health (1468429321738911947)
+
+**알림 형식:**
+```
+🚨 긴급: OpenClaw 자가복구 실패
+
+시간: YYYY-MM-DD-HHMM
+상태:
+- Level 1 (Watchdog) ❌
+- Level 2 (Health Check) ❌  
+- Level 3 (Claude Recovery) ❌
+
+수동 개입 필요합니다.
+
+로그:
+- ~/openclaw/memory/emergency-recovery-*.log
+- ~/openclaw/memory/claude-session-*.log
+- ~/openclaw/memory/emergency-recovery-report-*.md (Claude 생성)
+```
+
+---
+
+## 테스트 시나리오
+
+### 1. Level 1 테스트 (Watchdog)
+
+**시나리오:** 프로세스 강제 종료
+
+```bash
+# Gateway PID 확인
+ps aux | grep openclaw-gateway | grep -v grep
+
+# 강제 종료
+kill -9 <PID>
+
+# 3분 이내 자동 재시작 확인
+sleep 180
+curl http://localhost:18789/
+```
+
+**예상 결과:**
+- Watchdog가 180초 이내 프로세스 재시작
+- HTTP 200 응답 복구
+
+---
+
+### 2. Level 2 테스트 (Health Check)
+
+**시나리오:** HTTP 응답 실패 (포트 블록)
+
+```bash
+# 포트 블록 (방화벽 규칙 또는 프록시 설정)
+# 또는 openclaw.json에서 잘못된 포트 설정
+
+# Health Check 로그 모니터링
+tail -f ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log
+```
+
+**예상 결과:**
+- Health Check가 HTTP 실패 감지
+- 3회 재시도 (30초 간격)
+- 5분 후에도 실패 시 Level 3 트리거
+
+---
+
+### 3. Level 3 테스트 (Claude Recovery)
+
+**시나리오:** 설정 오류 주입
+
+```bash
+# openclaw.json 백업
+cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak
+
+# 의도적 오류 주입 (예: 잘못된 포트)
+# (수동 편집 필요)
+
+# Gateway 재시작
+openclaw gateway restart
+
+# Emergency Recovery 트리거 대기 (최대 8분)
+# - Health Check 감지: ~5분
+# - Level 3 시작: +30분
+
+# 로그 모니터링
+tail -f ~/openclaw/memory/emergency-recovery-*.log
+```
+
+**예상 결과:**
+- Claude가 설정 오류 감지
+- 설정 수정 시도
+- 복구 리포트 생성
+- HTTP 200 복구 또는 실패 리포트
+
+---
+
+### 4. Level 4 테스트 (Discord Notification)
+
+**시나리오:** Level 3 실패 시뮬레이션
+
+```bash
+# Level 3 실패 로그 수동 생성
+cat > ~/openclaw/memory/emergency-recovery-test-$(date +%Y-%m-%d-%H%M).log << 'EOF'
+[2026-02-05 20:00:00] === Emergency Recovery Started ===
+[2026-02-05 20:30:00] Gateway still unhealthy after Claude recovery (HTTP 500)
+
+=== MANUAL INTERVENTION REQUIRED ===
+Level 1 (Watchdog) ❌
+Level 2 (Health Check) ❌
+Level 3 (Claude Recovery) ❌
+EOF
+
+# Monitor 스크립트 실행 (또는 크론 대기)
+~/openclaw/scripts/emergency-recovery-monitor.sh
+```
+
+**예상 결과:**
+- Discord #jarvis-health에 알림 전송
+- 중복 알림 방지 기록 생성
+
+---
+
+## 운영 가이드
+
+### 상태 확인
+
+```bash
+# LaunchAgent 상태
+launchctl list | grep openclaw
+
+# Health Check 로그
+tail -f ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log
+
+# Emergency Recovery 로그
+ls -lt ~/openclaw/memory/emergency-recovery-*.log | head -5
+
+# Cron 상태
+openclaw cron list | grep "Emergency Recovery"
+```
+
+---
+
+### 수동 복구
+
+Level 3 실패 시 수동 복구 절차:
+
+```bash
+# 1. 로그 확인
+tail -100 ~/.openclaw/logs/gateway.log
+tail -100 ~/.openclaw/logs/gateway.err.log
+
+# 2. 설정 검증
+openclaw doctor --non-interactive
+
+# 3. 포트 충돌 체크
+lsof -i :18789
+
+# 4. 의존성 체크
+node --version
+npm list -g openclaw
+
+# 5. Gateway 완전 재시작
+openclaw gateway stop
+sleep 5
+openclaw gateway start
+
+# 6. 복구 확인
+curl -i http://localhost:18789/
+```
+
+---
+
+### 비활성화
+
+시스템 유지보수 또는 디버깅 시:
+
+```bash
+# Health Check 비활성화
+launchctl unload ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+
+# Emergency Recovery Monitor 크론 비활성화
+openclaw cron disable eddd4e18-b995-4420-8465-7c6927280228
+
+# 재활성화
+launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
+openclaw cron enable eddd4e18-b995-4420-8465-7c6927280228
+```
+
+---
+
+## 모니터링 메트릭
+
+**추적 지표:**
+
+| 지표 | 수집 위치 | 목표 |
+|------|----------|------|
+| Health Check 성공률 | healthcheck-*.log | > 99% |
+| Level 1 복구 횟수 | watchdog.log | < 1/day |
+| Level 2 복구 횟수 | healthcheck-*.log | < 1/week |
+| Level 3 트리거 횟수 | emergency-recovery-*.log | 0/month |
+| Level 4 알림 횟수 | Discord #jarvis-health | 0/month |
+| 평균 복구 시간 | healthcheck-*.log | < 5분 (Level 1-2) |
+
+**주간 리뷰 (일요일 23:30 감사 크론):**
+- Health Check 로그 분석
+- Level 3 트리거 이력 확인
+- 반복 패턴 식별
+- 시스템 개선 제안
+
+---
+
+## 제한사항
+
+1. **Claude Code 의존성**
+   - Level 3는 Claude CLI 설치 필요
+   - Claude API 할당량 소진 시 Level 3 실패 가능
+
+2. **tmux 의존성**
+   - PTY 세션에 tmux 필요
+   - tmux 설치 안 되어 있으면 Level 3 불가
+
+3. **네트워크 장애**
+   - Claude API 접근 불가 시 Level 3 실패
+   - Discord API 접근 불가 시 Level 4 알림 실패
+
+4. **macOS 전용**
+   - LaunchAgent는 macOS 전용
+   - Linux는 systemd 변환 필요
+
+---
+
+## 확장 계획
+
+**Phase 2 (미래):**
+- [ ] GitHub Issues 자동 생성 (Level 4 실패 시)
+- [ ] Telegram 알림 추가 (이중화)
+- [ ] Prometheus 메트릭 수집
+- [ ] Grafana 대시보드 구축
+- [ ] Multi-node 지원 (클러스터 환경)
+
+---
+
+## 참고 자료
+
+- [OpenClaw Docs](https://docs.openclaw.ai)
+- [Moltbook: Nightly Build Pattern](https://moltbook.com) (Level 3 영감)
+- [Moltbook: Reliability Check](https://moltbook.com) (Health Check 영감)
+- [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code)
+
+---
+
+**작성일:** 2026-02-05  
+**최종 업데이트:** 2026-02-05  
+**작성자:** Jarvis (Self-Healing System Implementation)