123 lines
2.8 KiB
Markdown
123 lines
2.8 KiB
Markdown
# Gateway Reliability - Simple Approach 🦀
|
|
|
|
## Philosophy (from zach-highley/openclaw-starter-kit)
|
|
|
|
**Don't build:**
|
|
- ❌ Custom watchdog scripts
|
|
- ❌ Meta-monitors that watch the watchers
|
|
- ❌ Anything with "monitor" or "watchdog" in the name
|
|
|
|
**Do this instead:**
|
|
```
|
|
systemd (KeepAlive=true) → 5 AM daily cron (openclaw doctor --fix)
|
|
```
|
|
|
|
That's it. Simple is better.
|
|
|
|
## How It Works
|
|
|
|
### 1. systemd Service Manager
|
|
- Gateway runs as a systemd service with automatic restart
|
|
- If it crashes, systemd restarts it automatically
|
|
- No custom scripts needed
|
|
|
|
### 2. Daily Maintenance Cron
|
|
- Runs at 5 AM daily
|
|
- Executes `openclaw doctor --fix` to clean up issues
|
|
- Prevents accumulation of problems
|
|
|
|
## Current Setup
|
|
|
|
### Service Status
|
|
```bash
|
|
# Check if running
|
|
systemctl --user status openclaw-gateway.service
|
|
|
|
# View logs
|
|
journalctl --user -u openclaw-gateway.service -n 50
|
|
```
|
|
|
|
### Daily Maintenance
|
|
The 5 AM cron runs:
|
|
- `openclaw doctor --fix` - fixes config issues
|
|
- Health checks
|
|
- Cleans up any accumulated problems
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Check overall status
|
|
openclaw status
|
|
|
|
# Check gateway health
|
|
openclaw health
|
|
|
|
# View recent logs
|
|
openclaw logs --tail 200
|
|
|
|
# Run diagnostics
|
|
openclaw doctor --non-interactive
|
|
|
|
# Manual restart (if needed)
|
|
systemctl --user restart openclaw-gateway.service
|
|
```
|
|
|
|
## Why This Is Better
|
|
|
|
**Before (over-engineered):**
|
|
- Custom monitor scripts
|
|
- Health check every 2 minutes
|
|
- Auto-restart logic
|
|
- Multiple logging paths
|
|
- Complex failure modes
|
|
|
|
**After (simple & reliable):**
|
|
- systemd handles restarts (built-in, tested)
|
|
- One daily maintenance cron
|
|
- Uses OpenClaw's built-in health checks
|
|
- Easier to debug
|
|
- Less to break
|
|
|
|
## Troubleshooting
|
|
|
|
### Gateway won't start
|
|
```bash
|
|
# Check service status
|
|
systemctl --user status openclaw-gateway.service
|
|
|
|
# View logs
|
|
journalctl --user -u openclaw-gateway.service --since "1 hour ago"
|
|
|
|
# Try doctor
|
|
openclaw doctor --fix
|
|
|
|
# Restart service
|
|
systemctl --user restart openclaw-gateway.service
|
|
```
|
|
|
|
### High memory usage
|
|
- Check session count: `openclaw sessions list`
|
|
- Sessions auto-reset daily at 4 AM
|
|
- Memory compaction enabled
|
|
|
|
### Config issues
|
|
- Run: `openclaw doctor --fix`
|
|
- Check: `openclaw status`
|
|
- Backup always at: `~/.openclaw/openclaw.json.bak`
|
|
|
|
## Lessons Learned
|
|
|
|
From zach-highley/openclaw-starter-kit:
|
|
|
|
1. **Keep it simple** - systemd + daily cron is enough
|
|
2. **Don't reinvent** - use built-in tools first
|
|
3. **MECE** - no overlapping systems
|
|
4. **Official docs first** - docs.openclaw.ai
|
|
5. **Learn & fix** - error → investigate → fix → document
|
|
|
|
## Reference
|
|
|
|
- [zach-highley/openclaw-starter-kit](https://github.com/zach-highley/openclaw-starter-kit)
|
|
- [OpenClaw Docs](https://docs.openclaw.ai/)
|
|
- [Incident Postmortem](https://github.com/zach-highley/openclaw-starter-kit/blob/main/docs/INCIDENT_POSTMORTEM.md)
|