Files
openclaw-backups/memory/2026-02-26-gateway-crash-fix.md

3.0 KiB

Session: 2026-02-26 03:17:25 UTC

  • Session Key: agent:main:main
  • Session ID: beeb3665-dcbf-436e-b5ea-39a5a7853235
  • Source: telegram

Conversation Summary

user: Read HEARTBEAT.md if it exists (workspace context). Follow it strictly. Do not infer or repeat old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK. Current time: Thursday, February 26th, 2026 — 1:36 AM (UTC) assistant: Gateway returning HTML instead of JSON and defaultModel is missing again. Let me check further: assistant: Issues found! Let me fix them: assistant: The gateway's API endpoints are returning HTML (the control UI). Let me check the actual health: assistant: Both services running. Let me check the watchdog that was sending repeated notifications: assistant: The watchdog is running. Let me check the systemd override that was triggering notifications: assistant: Let me check for crash loops: assistant: FOUND THE ISSUE!

The gateway had a crash loop at 01:00-01:01 UTC. Each restart triggered ExecStartPostgateway-telegram-notify.sh → Telegram message. That's why you got spammed.

Summary

Issue Status Details
defaultModel missing Fixed Set to sub-claude/claude-sonnet-4-6
Gateway crash loop ⚠️ Was happening 10+ restarts in 1 minute at 01:00 UTC
Notification spam 🔴 Root cause ExecStartPost fires on EVERY start

Recommendation

The systemd override should be modified to not send notifications during crash loops. Options:

  1. Remove the override entirely (no notifications)
  2. Add cooldown logic to the script (don't notify if last notification was <5 min ago)
  3. Remove ExecStartPost and rely on watchdog instead

Would you like me to implement one of these fixes? user: Read HEARTBEAT.md if it exists (workspace context). Follow it strictly. Do not infer or repeat old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK. Current time: Thursday, February 26th, 2026 — 2:00 AM (UTC) assistant: Good - defaultModel is set. Let me check gateway status and recent stability: assistant: Gateway stable. Quick checks: assistant: Cron jobs are mostly healthy. Let me verify context usage: assistant: Context at 29% - healthy. Quick final checks: assistant: ## Heartbeat Check Complete

Check Status
Gateway Stable (no crashes in 30 min)
defaultModel Set to sub-claude/claude-sonnet-4-6
Context 29% (healthy)
Cron jobs ⚠️ Birthday Tracker (2 errors), Weekend Planner (model issue)
NIM API ⚠️ Rate limits hit earlier (01:06 UTC)

No immediate action required. Previous crash loop and notification spam issue has stabilized.

HEARTBEAT_OK user: Read HEARTBEAT.md if it exists (workspace context). Follow it strictly. Do not infer or repeat old tasks from prior chats. If nothing needs attention, reply HEARTBEAT_OK. Current time: Thursday, February 26th, 2026 — 3:00 AM (UTC)