Files
openclaw-backups/archive/docs/GATEWAY-MONITORING.md

123 lines
2.8 KiB
Markdown

# Gateway Reliability - Simple Approach 🦀
## Philosophy (from zach-highley/openclaw-starter-kit)
**Don't build:**
- ❌ Custom watchdog scripts
- ❌ Meta-monitors that watch the watchers
- ❌ Anything with "monitor" or "watchdog" in the name
**Do this instead:**
```
systemd (KeepAlive=true) → 5 AM daily cron (openclaw doctor --fix)
```
That's it. Simple is better.
## How It Works
### 1. systemd Service Manager
- Gateway runs as a systemd service with automatic restart
- If it crashes, systemd restarts it automatically
- No custom scripts needed
### 2. Daily Maintenance Cron
- Runs at 5 AM daily
- Executes `openclaw doctor --fix` to clean up issues
- Prevents accumulation of problems
## Current Setup
### Service Status
```bash
# Check if running
systemctl --user status openclaw-gateway.service
# View logs
journalctl --user -u openclaw-gateway.service -n 50
```
### Daily Maintenance
The 5 AM cron runs:
- `openclaw doctor --fix` - fixes config issues
- Health checks
- Cleans up any accumulated problems
## Commands
```bash
# Check overall status
openclaw status
# Check gateway health
openclaw health
# View recent logs
openclaw logs --tail 200
# Run diagnostics
openclaw doctor --non-interactive
# Manual restart (if needed)
systemctl --user restart openclaw-gateway.service
```
## Why This Is Better
**Before (over-engineered):**
- Custom monitor scripts
- Health check every 2 minutes
- Auto-restart logic
- Multiple logging paths
- Complex failure modes
**After (simple & reliable):**
- systemd handles restarts (built-in, tested)
- One daily maintenance cron
- Uses OpenClaw's built-in health checks
- Easier to debug
- Less to break
## Troubleshooting
### Gateway won't start
```bash
# Check service status
systemctl --user status openclaw-gateway.service
# View logs
journalctl --user -u openclaw-gateway.service --since "1 hour ago"
# Try doctor
openclaw doctor --fix
# Restart service
systemctl --user restart openclaw-gateway.service
```
### High memory usage
- Check session count: `openclaw sessions list`
- Sessions auto-reset daily at 4 AM
- Memory compaction enabled
### Config issues
- Run: `openclaw doctor --fix`
- Check: `openclaw status`
- Backup always at: `~/.openclaw/openclaw.json.bak`
## Lessons Learned
From zach-highley/openclaw-starter-kit:
1. **Keep it simple** - systemd + daily cron is enough
2. **Don't reinvent** - use built-in tools first
3. **MECE** - no overlapping systems
4. **Official docs first** - docs.openclaw.ai
5. **Learn & fix** - error → investigate → fix → document
## Reference
- [zach-highley/openclaw-starter-kit](https://github.com/zach-highley/openclaw-starter-kit)
- [OpenClaw Docs](https://docs.openclaw.ai/)
- [Incident Postmortem](https://github.com/zach-highley/openclaw-starter-kit/blob/main/docs/INCIDENT_POSTMORTEM.md)