3.5 KiB
3.5 KiB
GitHub Issue Draft - Gateway Crash on Fetch Failed
Repository: clawdbot/clawdbot
Title: Gateway crash on unhandled fetch rejection (network instability)
Problem
Gateway crashes when network requests fail due to unhandled promise rejection in fetch calls.
Frequency
- 28+ crashes in one day (2026-01-31)
- First crash: 11:27 KST
- Last crash: 16:26 KST
- Average: ~1 crash every 15-20 minutes during active hours
- Still occurring as of 2026-02-01
Environment
- OS: macOS (Darwin 25.2.0 arm64)
- Node.js: v25.5.0
- Gateway: Clawdbot (latest npm version)
Stack Trace
Unhandled promise rejection: TypeError: fetch failed
at node:internal/deps/undici/undici:16416:13
at processTicksAndRejections (node:internal/process/task_queues:104:5)
at runNextTicks (node:internal/process/task_queues:69:3)
at processTimers (node:internal/timers:538:9)
Crash Pattern
fetch failederror logged- Gateway process terminates immediately
- Watchdog detects crash and restarts
- New process starts with new PID
- Cycle repeats on next network failure
Impact
- Gateway requires manual restart after each crash
- Session interruptions
- Memory search disrupted when using Gemini embeddings
- Stability degradation during high usage
Temporary Workaround
Switched embedding provider to reduce rate limit pressure:
memorySearch.provider: gemini → openaimemorySearch.model: gemini-embedding-001 → text-embedding-3-small
This reduced Gemini API calls but doesn't address the underlying crash issue.
Expected Behavior
Gateway should gracefully handle fetch failures without crashing:
- Wrap all fetch calls in try-catch
- Retry logic with exponential backoff (3 retries recommended)
- Circuit breaker pattern for repeated failures
- Fallback mechanisms (e.g., skip embedding if API down)
- Error logging without process termination
Root Cause (Suspected)
Appears to be a general issue with fetch error handling in gateway core, not specific to any single API:
Triggers observed:
- Network timeouts
- DNS resolution failures
- API rate limits (429, 503)
- Connection refused
- Telegram media downloads (voice files, photos)
- Any fetch() call that rejects
Why it crashes:
- Promise rejection from undici (Node.js native fetch) not caught
- No top-level error handler for fetch failures
- Process exits due to unhandled rejection
Which APIs affected: Likely any external API call:
- OpenAI (embeddings, completions)
- Anthropic (Claude API)
- Memory search providers (Gemini, OpenAI)
- Telegram (media downloads: voice/photo/video)
- Web fetch operations
- GitHub API calls
Example error (Telegram media):
[telegram] handler failed: MediaFetchError: Failed to fetch media from
https://api.telegram.org/file/bot.../voice/file_113.oga: TypeError: fetch failed
Additional Context
- Rate limit errors should be caught and handled, not crash the process
- Network instability is common in production environments
- Gateway restart automation helps but doesn't solve the root issue
Suggested Fix
- Wrap all fetch calls in try-catch with proper error handling
- Implement retry strategy (e.g., 3 retries with exponential backoff)
- Add circuit breaker for repeated API failures
- Log errors to monitoring system instead of crashing
- Consider graceful degradation (e.g., skip embedding if API unavailable)
Reporter: @Ramsbaby Date: 2026-02-01 Priority: High (affects daily stability)