AI Newsletter Digest improvements: fixed QP soft line break decoding, URL extraction, and content cleaning
This commit is contained in:
108
skills/openclaw-self-healing/memory/github-issue-draft.md
Normal file
108
skills/openclaw-self-healing/memory/github-issue-draft.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# GitHub Issue Draft - Gateway Crash on Fetch Failed
|
||||
|
||||
**Repository:** clawdbot/clawdbot
|
||||
|
||||
**Title:** Gateway crash on unhandled fetch rejection (network instability)
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
Gateway crashes when network requests fail due to **unhandled promise rejection** in fetch calls.
|
||||
|
||||
## Frequency
|
||||
- **28+ crashes in one day** (2026-01-31)
|
||||
- First crash: 11:27 KST
|
||||
- Last crash: 16:26 KST
|
||||
- Average: ~1 crash every 15-20 minutes during active hours
|
||||
- Still occurring as of 2026-02-01
|
||||
|
||||
## Environment
|
||||
- **OS:** macOS (Darwin 25.2.0 arm64)
|
||||
- **Node.js:** v25.5.0
|
||||
- **Gateway:** Clawdbot (latest npm version)
|
||||
|
||||
## Stack Trace
|
||||
```
|
||||
Unhandled promise rejection: TypeError: fetch failed
|
||||
at node:internal/deps/undici/undici:16416:13
|
||||
at processTicksAndRejections (node:internal/process/task_queues:104:5)
|
||||
at runNextTicks (node:internal/process/task_queues:69:3)
|
||||
at processTimers (node:internal/timers:538:9)
|
||||
```
|
||||
|
||||
## Crash Pattern
|
||||
1. `fetch failed` error logged
|
||||
2. Gateway process terminates immediately
|
||||
3. Watchdog detects crash and restarts
|
||||
4. New process starts with new PID
|
||||
5. Cycle repeats on next network failure
|
||||
|
||||
## Impact
|
||||
- Gateway requires manual restart after each crash
|
||||
- Session interruptions
|
||||
- Memory search disrupted when using Gemini embeddings
|
||||
- Stability degradation during high usage
|
||||
|
||||
## Temporary Workaround
|
||||
Switched embedding provider to reduce rate limit pressure:
|
||||
- `memorySearch.provider`: gemini → openai
|
||||
- `memorySearch.model`: gemini-embedding-001 → text-embedding-3-small
|
||||
|
||||
This reduced Gemini API calls but doesn't address the underlying crash issue.
|
||||
|
||||
## Expected Behavior
|
||||
Gateway should **gracefully handle fetch failures** without crashing:
|
||||
- Wrap all fetch calls in try-catch
|
||||
- Retry logic with exponential backoff (3 retries recommended)
|
||||
- Circuit breaker pattern for repeated failures
|
||||
- Fallback mechanisms (e.g., skip embedding if API down)
|
||||
- Error logging without process termination
|
||||
|
||||
## Root Cause (Suspected)
|
||||
Appears to be a general issue with **fetch error handling** in gateway core, not specific to any single API:
|
||||
|
||||
**Triggers observed:**
|
||||
- Network timeouts
|
||||
- DNS resolution failures
|
||||
- API rate limits (429, 503)
|
||||
- Connection refused
|
||||
- **Telegram media downloads** (voice files, photos)
|
||||
- Any fetch() call that rejects
|
||||
|
||||
**Why it crashes:**
|
||||
- Promise rejection from undici (Node.js native fetch) not caught
|
||||
- No top-level error handler for fetch failures
|
||||
- Process exits due to unhandled rejection
|
||||
|
||||
**Which APIs affected:**
|
||||
Likely any external API call:
|
||||
- OpenAI (embeddings, completions)
|
||||
- Anthropic (Claude API)
|
||||
- Memory search providers (Gemini, OpenAI)
|
||||
- **Telegram (media downloads: voice/photo/video)**
|
||||
- Web fetch operations
|
||||
- GitHub API calls
|
||||
|
||||
**Example error (Telegram media):**
|
||||
```
|
||||
[telegram] handler failed: MediaFetchError: Failed to fetch media from
|
||||
https://api.telegram.org/file/bot.../voice/file_113.oga: TypeError: fetch failed
|
||||
```
|
||||
|
||||
## Additional Context
|
||||
- Rate limit errors should be caught and handled, not crash the process
|
||||
- Network instability is common in production environments
|
||||
- Gateway restart automation helps but doesn't solve the root issue
|
||||
|
||||
## Suggested Fix
|
||||
1. Wrap all fetch calls in try-catch with proper error handling
|
||||
2. Implement retry strategy (e.g., 3 retries with exponential backoff)
|
||||
3. Add circuit breaker for repeated API failures
|
||||
4. Log errors to monitoring system instead of crashing
|
||||
5. Consider graceful degradation (e.g., skip embedding if API unavailable)
|
||||
|
||||
---
|
||||
|
||||
**Reporter:** @Ramsbaby
|
||||
**Date:** 2026-02-01
|
||||
**Priority:** High (affects daily stability)
|
||||
Reference in New Issue
Block a user