WHM Server Monitor: Essential Features & Setup Guide

WHM Server Monitor Troubleshooting: Common Issues & Fixes

1. Service showing as down (HTTP, FTP, SSH, etc.)

  • Likely causes: service crashed, misconfiguration, port blocked by firewall, resource exhaustion.
  • Immediate fixes:
    1. Restart the service in WHM or via SSH:

      Code

      systemctl restart httpd systemctl restart sshd
    2. Check service status and recent errors:

      Code

      systemctl status httpd –no-pager journalctl -u httpd -n 100 tail -n 200 /var/log/apache2/errorlog# or /etc/apache2 paths
    3. Verify firewall rules (iptables/nftables/csf):

      Code

      ss -tulpn | grep :80 iptables -L -n csf -l
    4. Confirm port listening and SELinux/AppArmor isn’t blocking.

2. False positives / intermittent alerts

  • Likely causes: transient network issues, aggressive thresholds, monitoring daemon restarts.
  • Fixes:
    • Increase alert sensitivity or grace period in WHM monitoring settings.
    • Ensure monitoring host has stable network and low packet loss (use ping/traceroute).
    • Check for cron jobs or automated tasks that restart services during maintenance windows.

3. High resource usage (CPU, RAM, I/O) causing degraded checks

  • Likely causes: runaway processes, DDoS, backups, cron-heavy tasks, insufficient hardware.
  • Fixes:
    1. Identify top consumers:

      Code

      top -o %CPU ps aux –sort=-%mem | head -n 15 iotop -o
    2. Limit or reschedule heavy tasks (backups, mass mailings).
    3. Tune Apache/nginx, PHP-FPM worker counts, and database settings.
    4. Add swap temporarily or scale resources if consistently saturated.

4. Disk space alerts but df shows space available

  • Likely causes: deleted files still held open by processes, different mountpoint, inode exhaustion.
  • Fixes:
    • Find deleted-but-open files:

      Code

      lsof | grep ‘(deleted)’

      then restart the owning process.

    • Check inodes:

      Code

      df -i
    • Verify correct filesystem/mountpoint being monitored.

5. Monitoring service fails to start or crashes

  • Likely causes: corrupt config, missing dependencies, permission changes.
  • Fixes:
    1. Check service logs and systemd status.
    2. Restore config from a known-good backup or compare with default config.
    3. Reinstall monitoring package if corrupted.
    4. Ensure correct user/group ownership and filesystem permissions.

6. Alert emails not received

  • Likely causes: SMTP misconfiguration, queued mail, spam filtering.
  • Fixes:
    • Test mail sending from server:

      Code

      echo “test” | mail -s “monitor test” [email protected]
    • Check mail queue and mail logs (/var/log/maillog or /var/log/eximmainlog).
    • Verify monitoring alarm recipient addresses and SMTP credentials.
    • Use an external mailbox to rule out local delivery issues.

7. Incorrect or stale status metrics

  • Likely causes: agent-server time drift, caching, metric collection interval too long.
  • Fixes:
    • Sync server time (chrony/ntpd):

      Code

      timedatectl status systemctl restart chronyd
    • Reduce metric cache TTL or collection interval in monitor settings.
    • Restart monitoring agent.

8. Database (MySQL/MariaDB) reported as down or slow

  • Likely causes: table corruption, high connections, slow queries, insufficient buffers.
  • Fixes:
    1. Check DB status and error log: /var/lib/mysql/*.err or system journal.
    2. Inspect slow query log and optimize queries/indexes.
    3. Increase max_connections or tune innodb_buffer_poolsize.
    4. Repair corrupted tables with mysqlcheck or myisamchk (as appropriate).

9. Permission or ownership errors in monitoring checks

  • Likely causes: updates changed UID/GID, SELinux contexts altered, config files moved.
  • Fixes:
    • Verify file ownership and permissions for agent configs and scripts.
    • Restore SELinux context if enforced:

      Code

      restorecon -Rv /path/to/monitor

10. Persistent SSL/TLS certificate warnings

  • Likely causes: expired certs, wrong chain, hostname mismatch.
  • Fixes:
    • Check cert expiry and chain:

      Code

      openssl s_client -connect yourhost:443 -showcerts
    • Renew certs (Let’s Encrypt certbot or provider) and ensure full chain is installed.
    • Verify hostnames used by monitoring match CN/SAN.

Quick troubleshooting checklist (run in order)

  1. Check service status and logs.
  2. Confirm ports are listening and firewall allows traffic.
  3. Inspect resource usage and disk/inode availability.
  4. Verify monitoring agent/service status and time sync.
  5. Test alert delivery (send a manual test email).
  6. Restart services/agents after config fixes; monitor for recurrence.

If you want, I can produce a concise bash checklist script to run the key diagnostics above.

Comments

Leave a Reply