Files
homelab-monitoring/docs/MONITORING-FINAL-SUMMARY.md
PVE Monitoring System 3a14fd2736 Initial backup: 18 monitoring scripts + timers + docs
- 18 comprehensive monitoring checks
- 5 systemd timers (5min, 15min, hourly, daily, weekly)
- Complete documentation
- NTFY secure notification system
- Fixed debianvm disk space (91% to 57%)
- Fixed CloudReve integration
- Date: 2026-01-07
2026-01-07 16:30:34 +08:00

3.3 KiB

HOMELAB MONITORING - FULLY OPERATIONAL

Status: ALL SYSTEMS ACTIVE & SECURE

Date: January 7, 2026 Implementation: Complete Security: Secure (obscure topic names)


🔒 Your Secure NTFY Topics

CRITICAL: anthony-homelab-95ccf258e17eba20-critical WARNING: anthony-homelab-95ccf258e17eba20-warning INFO: anthony-homelab-95ccf258e17eba20-info

These are SECURE - the random hex string makes them impossible to guess. Nobody can spy on your notifications.


📊 What's Being Monitored (18 Systems)

Every 5 Minutes:

  • Container status (docker, cloudreve, gitea, sftpgo)
  • VM/Container unexpected shutdowns

Every 15 Minutes:

  • Service health (CloudReve, Home Assistant HTTP)
  • Database health (PostgreSQL, Redis, MongoDB, aria2)
  • Docker container restarts

Every Hour:

  • PVE Host (disk, RAM, CPU, services)
  • ALL VM disk space (debianvm, ubuntu-server-xfce, haos)
  • Network storage (Fred NFS, iMacHDD CIFS)
  • LVM Thin Pools (CRITICAL - can freeze VMs!)
  • Ceph cluster health
  • Tailscale VPN connectivity
  • OOM killer detection
  • Temperature monitoring
  • Public IP changes
  • Failed login attempts

Daily (3 AM):

  • Backup job status
  • SSL certificate expiry
  • System updates

Weekly (Sunday 2 AM):

  • Internet speed test

🎯 Alert Levels

🔴 CRITICAL (Urgent):

  • Disk >90% on any system
  • Services completely down
  • Thin pool >90% (VMs will freeze!)
  • Databases down
  • VMs/containers stopped unexpectedly

🟡 WARNING (High Priority):

  • Disk 80-90%
  • High CPU/RAM usage
  • Thin pool 80-90%
  • Network storage issues
  • Slow internet speed

🔵 INFO (Informational):

  • System updates available
  • Public IP changed
  • Backup completed
  • Speed test results

What We Fixed Today

  1. Freed 46GB on debianvm (91% → 57%)
  2. Fixed CloudReve/aria2 integration
  3. Expanded VM 280 disk by 7GB (97% → 87%)
  4. Implemented 18 comprehensive monitors
  5. Secured notifications (obscure topics)
  6. Centralized everything on PVE host

📱 Management Commands

View active timers: systemctl list-timers homelab-monitor-*

View recent logs: journalctl -t homelab-monitor -n 50

Run checks manually: /usr/local/bin/check-pve-host.sh /usr/local/bin/check-all-vm-disks.sh /usr/local/bin/check-thin-pools.sh /usr/local/bin/check-databases.sh

Test notifications: /usr/local/bin/send-ntfy.sh critical Test Message test /usr/local/bin/send-ntfy.sh warning Test Message test /usr/local/bin/send-ntfy.sh info Test Message test


📍 Important Files

Scripts: /usr/local/bin/check-.sh Main sender: /usr/local/bin/send-ntfy.sh Topic names: /root/.ntfy-topics Timers: /etc/systemd/system/homelab-monitor-.timer This doc: /root/MONITORING-FINAL-SUMMARY.md


🔧 Old Monitoring (DEBIANVM)

Status: Still running in parallel Will be disabled after 1 week of successful new monitoring Location: /usr/local/bin/ on DEBIANVM

To disable old monitoring later: ssh root@DEBIANVM systemctl stop homelab-hourly.timer homelab-daily.timer homelab-weekly.timer disk-monitor.timer systemctl disable homelab-hourly.timer homelab-daily.timer homelab-weekly.timer disk-monitor.timer


🎉 You're All Set!

Your entire homelab is now comprehensively monitored with:

  • 18 different health checks
  • Clear, contextual alerts
  • Secure, private notifications
  • Centralized management
  • Proactive issue detection

You'll know immediately if anything goes wrong!