- 18 comprehensive monitoring checks - 5 systemd timers (5min, 15min, hourly, daily, weekly) - Complete documentation - NTFY secure notification system - Fixed debianvm disk space (91% to 57%) - Fixed CloudReve integration - Date: 2026-01-07
3.3 KiB
✅ HOMELAB MONITORING - FULLY OPERATIONAL
Status: ALL SYSTEMS ACTIVE & SECURE
Date: January 7, 2026 Implementation: Complete Security: Secure (obscure topic names)
🔒 Your Secure NTFY Topics
CRITICAL: anthony-homelab-95ccf258e17eba20-critical WARNING: anthony-homelab-95ccf258e17eba20-warning INFO: anthony-homelab-95ccf258e17eba20-info
These are SECURE - the random hex string makes them impossible to guess. Nobody can spy on your notifications.
📊 What's Being Monitored (18 Systems)
Every 5 Minutes:
- Container status (docker, cloudreve, gitea, sftpgo)
- VM/Container unexpected shutdowns
Every 15 Minutes:
- Service health (CloudReve, Home Assistant HTTP)
- Database health (PostgreSQL, Redis, MongoDB, aria2)
- Docker container restarts
Every Hour:
- PVE Host (disk, RAM, CPU, services)
- ALL VM disk space (debianvm, ubuntu-server-xfce, haos)
- Network storage (Fred NFS, iMacHDD CIFS)
- LVM Thin Pools (CRITICAL - can freeze VMs!)
- Ceph cluster health
- Tailscale VPN connectivity
- OOM killer detection
- Temperature monitoring
- Public IP changes
- Failed login attempts
Daily (3 AM):
- Backup job status
- SSL certificate expiry
- System updates
Weekly (Sunday 2 AM):
- Internet speed test
🎯 Alert Levels
🔴 CRITICAL (Urgent):
- Disk >90% on any system
- Services completely down
- Thin pool >90% (VMs will freeze!)
- Databases down
- VMs/containers stopped unexpectedly
🟡 WARNING (High Priority):
- Disk 80-90%
- High CPU/RAM usage
- Thin pool 80-90%
- Network storage issues
- Slow internet speed
🔵 INFO (Informational):
- System updates available
- Public IP changed
- Backup completed
- Speed test results
✅ What We Fixed Today
- Freed 46GB on debianvm (91% → 57%)
- Fixed CloudReve/aria2 integration
- Expanded VM 280 disk by 7GB (97% → 87%)
- Implemented 18 comprehensive monitors
- Secured notifications (obscure topics)
- Centralized everything on PVE host
📱 Management Commands
View active timers: systemctl list-timers homelab-monitor-*
View recent logs: journalctl -t homelab-monitor -n 50
Run checks manually: /usr/local/bin/check-pve-host.sh /usr/local/bin/check-all-vm-disks.sh /usr/local/bin/check-thin-pools.sh /usr/local/bin/check-databases.sh
Test notifications: /usr/local/bin/send-ntfy.sh critical Test Message test /usr/local/bin/send-ntfy.sh warning Test Message test /usr/local/bin/send-ntfy.sh info Test Message test
📍 Important Files
Scripts: /usr/local/bin/check-.sh Main sender: /usr/local/bin/send-ntfy.sh Topic names: /root/.ntfy-topics Timers: /etc/systemd/system/homelab-monitor-.timer This doc: /root/MONITORING-FINAL-SUMMARY.md
🔧 Old Monitoring (DEBIANVM)
Status: Still running in parallel Will be disabled after 1 week of successful new monitoring Location: /usr/local/bin/ on DEBIANVM
To disable old monitoring later: ssh root@DEBIANVM systemctl stop homelab-hourly.timer homelab-daily.timer homelab-weekly.timer disk-monitor.timer systemctl disable homelab-hourly.timer homelab-daily.timer homelab-weekly.timer disk-monitor.timer
🎉 You're All Set!
Your entire homelab is now comprehensively monitored with:
- 18 different health checks
- Clear, contextual alerts
- Secure, private notifications
- Centralized management
- Proactive issue detection
You'll know immediately if anything goes wrong!