Metadata-Version: 2.4
Name: server-guardian-mcp
Version: 1.0.5
Summary: MCP server to monitor and manage remote Linux servers via SSH. 63 tools: health checks, log search, APM, SLOs, anomaly detection, auto-remediation, live dashboard, CIS benchmarks, CVE scanning, database monitoring, compliance reports, team RBAC, PagerDuty/Telegram/OpsGenie.
Author: Md Nazish Arman
License: Server Guardian MCP - Proprietary License
        
        Copyright (c) 2026 Md Nazish Arman. All rights reserved.
        
        TERMS AND CONDITIONS
        
        1. DEFINITIONS
           "Software" refers to Server Guardian MCP, including all source code,
           documentation, binaries, and associated files.
           "Author" refers to Md Nazish Arman.
           "User" refers to any individual or entity using the Software.
        
        2. GRANT OF LICENSE
           The Author grants you a limited, non-exclusive, non-transferable,
           revocable license to use the Software for PERSONAL, NON-COMMERCIAL
           evaluation purposes only.
        
        3. RESTRICTIONS
           You may NOT, without prior written permission and a paid license
           from the Author:
        
           a) Use the Software for any commercial, business, or revenue-generating
              purpose, including but not limited to:
              - Using it within a company, organization, or business of any size
              - Offering it as part of a paid service or product
              - Using it to manage servers or infrastructure for clients
              - Integrating it into any commercial product or service
              - Using it in any way that generates direct or indirect revenue
        
           b) Redistribute, sublicense, sell, lease, rent, or otherwise transfer
              the Software or any portion thereof to any third party.
        
           c) Modify, adapt, translate, reverse engineer, decompile, or
              disassemble the Software for the purpose of creating a competing
              product or service.
        
           d) Remove, alter, or obscure any copyright notices, proprietary
              legends, or attribution contained in the Software.
        
           e) Use the Software to provide managed services, consulting services,
              or any form of service bureau.
        
        4. COMMERCIAL LICENSING
           For commercial use, business use, or any use beyond personal
           evaluation, you MUST purchase a commercial license from the Author.
           Contact: Md Nazish Arman
        
        5. INTELLECTUAL PROPERTY
           The Software and all copies thereof are proprietary to the Author
           and title thereto remains in the Author. All applicable rights to
           patents, copyrights, trademarks, and trade secrets in the Software
           are and shall remain in the Author.
        
        6. TERMINATION
           This license is effective until terminated. It will terminate
           automatically without notice if you fail to comply with any term
           of this license. Upon termination, you must destroy all copies of
           the Software in your possession.
        
        7. DISCLAIMER OF WARRANTIES
           THE SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND,
           EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
           MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
           NONINFRINGEMENT.
        
        8. LIMITATION OF LIABILITY
           IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES, OR
           OTHER LIABILITY ARISING FROM THE USE OF THE SOFTWARE.
        
        9. GOVERNING LAW
           This license shall be governed by and construed in accordance with
           applicable copyright law.
        
        For commercial licensing inquiries, contact: Md Nazish Arman
License-File: LICENSE
Keywords: anomaly-detection,compliance,dashboard,devops,docker,linux,mcp,monitoring,playbooks,server,ssh,vps
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.10
Requires-Dist: mcp>=1.0.0
Requires-Dist: paramiko>=3.0.0
Requires-Dist: starlette>=0.36.0
Requires-Dist: uvicorn>=0.27.0
Description-Content-Type: text/markdown

# Server Guardian MCP

The most comprehensive server management MCP ever built. **63 tools**, **8 connection types**, **16 modules** — log search, access log APM, SLO tracking, anomaly detection, auto-remediation playbooks, CIS benchmarks, CVE scanning, database monitoring, network monitoring, file integrity, live web dashboard, compliance reports, public status pages, team RBAC, PagerDuty/Telegram/OpsGenie — all through Claude. No agents. Just SSH.

> **"The AI SRE that lives in your terminal. SSH into any server, diagnose any problem, fix it automatically — all through a conversation with Claude. No agents. No SaaS bills. No PromQL."**

## Live Dashboard

```bash
python -m server_guardian_mcp dashboard           # start on port 8080
python -m server_guardian_mcp dashboard --port 9090
```

Real-time web UI with auto-refresh every 30 seconds. Dark theme, Chart.js charts for CPU/memory/disk trends, active alerts feed, incident timeline.

## Why Server Guardian?

| What you say to Claude | What happens |
|------------------------|-------------|
| "Is my server okay?" | SSH in, check CPU/RAM/disk/temp, detect anomalies vs baseline |
| "Why is production slow?" | Check processes, disk, logs, access log APM, identify the bottleneck |
| "Search logs for OOM errors" | Index logs in SQLite, search with pattern detection, show error rates |
| "Show me endpoint latency" | Parse nginx access logs — p50/p95/p99 latency, error rates, slowest endpoints |
| "Are we meeting our SLOs?" | Track uptime/latency/error targets, calculate error budget remaining |
| "What happened overnight?" | Generate incident narrative from alerts, service events, playbook runs |
| "Fix it automatically" | Run playbooks: clear disk, restart services, renew SSL certs |
| "Run a security audit" | 61 CIS benchmark checks + CVE scan + rootkit detection + FIM |
| "Generate a compliance report" | Branded HTML report with score (A-F) for SOC2/ISO prep |
| "How's the database?" | Slow query analysis, connection counts, replication lag, table sizes |
| "Am I overpaying?" | Rightsizing analysis: "CPU at 0.4%, memory at 7.7% — downsize to save 50%" |
| "What connects to what?" | Map service dependencies from active network connections |
| "Write the postmortem" | Auto-generate structured postmortem from incident timeline |
| "Create a status page" | Public-facing uptime page for customers (replaces $29/mo tools) |

## Benchmarks vs Alternatives

| Feature | Server Guardian | ssh-mcp | mcp-ssh-manager | HomeButler |
|---------|:-:|:-:|:-:|:-:|
| **Total tools** | **63** | 2 | 37 | 20 |
| **Connection types** | **8** | 1 | 1 | 1 |
| Log search + pattern detection | **Yes** | - | - | - |
| Access log APM (p50/p95/p99) | **Yes** | - | - | - |
| SLO tracking + error budgets | **Yes** | - | - | - |
| Smart anomaly detection | **Yes** | - | - | - |
| Auto-remediation playbooks | **Yes** | - | - | - |
| CIS benchmark (61 checks) | **Yes** | - | - | - |
| CVE scanning + rootkit detection | **Yes** | - | - | - |
| File integrity monitoring | **Yes** | - | - | - |
| Database monitoring (MySQL/PG) | **Yes** | - | - | - |
| Network bandwidth monitoring | **Yes** | - | - | - |
| Service dependency mapping | **Yes** | - | - | - |
| Root cause correlation | **Yes** | - | - | - |
| Resource rightsizing | **Yes** | - | - | - |
| Multi-step API tests | **Yes** | - | - | - |
| Maintenance windows | **Yes** | - | - | - |
| Public status page | **Yes** | - | - | - |
| AI postmortem generation | **Yes** | - | - | - |
| Live web dashboard (Chart.js) | **Yes** | - | - | - |
| Compliance report (SOC2/ISO) | **Yes** | - | - | - |
| Team RBAC (admin/operator/viewer) | **Yes** | - | - | - |
| PagerDuty / Telegram / OpsGenie | **Yes** | - | - | - |
| Background watchdog daemon | **Yes** | - | - | Yes |
| Email / Slack / Discord alerts | **Yes** | - | - | Yes |
| Multi-cloud (AWS/GCP/Azure) | **Yes** | - | - | - |
| Docker container management | **Yes** | - | Yes | Yes |

## Quick Install

### Claude Code (recommended)
```bash
claude mcp add server-guardian -- uvx server-guardian-mcp
```

### pip
```bash
pip install server-guardian-mcp
claude mcp add server-guardian -- python -m server_guardian_mcp
```

### From source
```bash
pip install -e .
claude mcp add server-guardian -- python -m server_guardian_mcp
```

## Setup (2 minutes)

### 1. Create your .env
```bash
cp .env.example .env
```

### 2. Add your servers

```env
# SSH (most common)
SERVER_PROD=ssh,203.0.113.10,22,deploy,key,~/.ssh/prod_key,Production

# Local machine
SERVER_LOCAL=local,,,,,My Machine

# Docker / Kubernetes / AWS SSM / GCP / Azure / WinRM also supported
```

### 3. Auto-discover existing servers
> "Discover my SSH servers" — reads ~/.ssh/config and shows ready-to-paste .env lines.

### 4. Add aliases (optional)
```env
SERVER_ALIASES=prod:PROD,stg:STAGING,dev:DEV
```

## All 63 Tools

### Core Server Management (6)
| Tool | What it does |
|------|-------------|
| `list_all_servers` | Show all servers with online/offline status and latency |
| `check_server_health` | Full snapshot: CPU, RAM, disk, swap, temp, load, top processes, network |
| `run_shell_commands` | Run one or more shell commands on any server |
| `run_shell_script` | Run multi-line bash scripts with shared variables |
| `fetch_system_logs` | Fetch dmesg/syslog/journal/auth/nginx/custom logs with grep filter |
| `list_running_processes` | Processes sorted by CPU or memory, with name filter |

### Service Management (5)
| Tool | What it does |
|------|-------------|
| `manage_systemd_service` | Start/stop/restart/enable/disable/status/logs for any systemd service |
| `list_all_services` | List ALL systemd services, filter by running/failed/inactive |
| `find_failed_services` | Find every crashed/failed service in one call |
| `restart_failed_services` | Bulk restart failed services — pass names or "ALL_FAILED" |
| `watch_service_status` | Quick is-active + is-enabled check for specific services |

### Monitoring & Alerting (5)
| Tool | What it does |
|------|-------------|
| `check_ssl_certificate` | SSL cert expiry, chain, issuer for any domain (no SSH) |
| `check_http_endpoint` | HTTP status, response time, headers for any URL (no SSH) |
| `monitor_server_health` | Health check + store in SQLite + auto-alert on thresholds |
| `monitor_endpoints` | Check HTTP/SSL targets + store + alert on failures |
| `get_active_alerts` | Show unresolved alerts grouped by severity |

### Log Search & APM (2)
| Tool | What it does |
|------|-------------|
| `search_logs` | Index logs in SQLite, search with pattern detection, extract error rates |
| `analyze_access_logs` | Nginx/Apache APM — per-endpoint p50/p95/p99 latency, error rates, throughput, top IPs |

### SLO Tracking & Reporting (4)
| Tool | What it does |
|------|-------------|
| `manage_slos` | Define uptime/latency/error rate targets, track compliance, error budgets |
| `generate_postmortem_tool` | Structured incident postmortem from alerts, services, playbook data |
| `generate_status_page_tool` | Public-facing status page for customers (replaces Better Stack $29/mo) |
| `get_weekly_report` | Weekly health summary for email or team review |

### Database Monitoring (2)
| Tool | What it does |
|------|-------------|
| `query_database` | Run SQL queries on MySQL, PostgreSQL, or SQLite on any server |
| `monitor_database` | Slow queries, connections, replication lag, table sizes (MySQL/PostgreSQL auto-detected) |

### Network Monitoring (2)
| Tool | What it does |
|------|-------------|
| `inspect_network` | Listening ports, active connections, interfaces, DNS, routing |
| `monitor_network` | Bandwidth per interface, connection states, TCP retransmissions, throughput rates |

### Security & Compliance (6)
| Tool | What it does |
|------|-------------|
| `run_security_audit` | 10-point security check (SSH, firewall, logins, updates, sudo) |
| `run_cis_benchmark` | 61 CIS Linux Benchmark checks across filesystem, network, SSH, PAM, logging |
| `scan_vulnerabilities` | CVE scanning (package versions), rootkit detection, crypto miner detection |
| `check_file_integrity` | FIM — hash critical files (/etc/passwd, sshd_config, etc.), detect unauthorized changes |
| `manage_firewall` | UFW/iptables: status, allow, deny, delete rules, enable/disable |
| `generate_compliance_report_tool` | Branded HTML report with score (A-F), suitable for SOC2/ISO |

### Docker (2)
| Tool | What it does |
|------|-------------|
| `list_docker_containers` | Containers with CPU, memory, network, block I/O stats |
| `fetch_docker_logs` | Container logs with grep filter and time range |

### Disk & Files (4)
| Tool | What it does |
|------|-------------|
| `analyze_disk_usage` | Find largest items, files >100MB, inode usage |
| `read_remote_file` | Read files on server (tail/head/all) with metadata |
| `upload_file_to_server` | SFTP upload with size verification |
| `download_file_from_server` | SFTP download |

### Multi-Server (2)
| Tool | What it does |
|------|-------------|
| `run_on_all_servers` | Same commands on multiple servers — pass ["ALL"] for all |
| `compare_across_servers` | Spot config drift: same command, side-by-side results |

### System Administration (4)
| Tool | What it does |
|------|-------------|
| `manage_cron_jobs` | List, add, remove cron jobs on any server |
| `manage_users` | List users, user info, add SSH keys, list keys, who is logged in |
| `manage_packages` | List/install/remove/upgrade packages (apt, yum, dnf, apk auto-detected) |
| `manage_nginx` | Status, list sites, show config, test, reload, restart, access/error logs |

### Git Deploy (1)
| Tool | What it does |
|------|-------------|
| `git_deploy` | Status, pull, log, branch, switch, stash, diff on server git repos |

### Discovery (1)
| Tool | What it does |
|------|-------------|
| `discover_ssh_servers` | Auto-discover servers from ~/.ssh/config with ready-to-paste .env lines |

### Dashboard & Analytics (6)
| Tool | What it does |
|------|-------------|
| `multi_server_dashboard` | One-call summary of ALL servers: health, CPU, RAM, disk, failed services |
| `get_monitoring_history` | Query health trends, service events, endpoint checks from SQLite |
| `get_incident_timeline` | Chronological event log for a server |
| `forecast_disk_usage` | Predict when disk will be full based on growth rate |
| `generate_html_dashboard` | Self-contained HTML status page — open in any browser |
| `resolve_alert` | Mark an alert as resolved |

### Intelligence & Automation (3)
| Tool | What it does |
|------|-------------|
| `detect_anomalies_tool` | Statistical anomaly detection — flags metrics >2.5 sigma from baseline |
| `replay_incident` | Generate chronological narrative from alerts, service events, playbook runs |
| `manage_playbooks` | Auto-remediation: disk cleanup, service restart, SSL renewal, custom playbooks |

### Team & Integrations (3)
| Tool | What it does |
|------|-------------|
| `team_manage` | RBAC user management: admin/operator/viewer roles with API keys |
| `check_integrations` | Status and test for PagerDuty, Telegram, OpsGenie |
| `live_dashboard_info` | How to start the live web dashboard and available API endpoints |

### Advanced Operations (5)
| Tool | What it does |
|------|-------------|
| `run_api_test_tool` | Multi-step API tests with variable extraction and assertions |
| `manage_maintenance_windows` | Suppress alerts during planned work |
| `get_rightsizing_recommendations` | Identify over/under-provisioned resources to save costs |
| `map_service_dependencies` | Discover service topology from active network connections |
| `analyze_root_cause` | Correlate anomalies across metrics, services, alerts for root cause analysis |

## Access Log APM

80% of APM value with zero agent install. Parse nginx/Apache access logs for:

```
Tell Claude: "analyze access logs on PROD"
```

- Per-endpoint latency percentiles (p50, p95, p99)
- Error rates (4xx, 5xx) per endpoint
- Throughput (requests per endpoint)
- Slowest endpoints ranked
- Status code breakdown
- Top IPs by request volume
- URL normalization (replaces IDs/UUIDs with placeholders)

## Log Search & Pattern Detection

```
Tell Claude: "search logs on PROD for OOM" or "show me log patterns"
```

- Fetches logs via SSH, indexes in SQLite for future searching
- Pattern detection — clusters similar log lines, shows frequency
- Error rate extraction (log-to-metrics)
- Supports journal, syslog, auth, nginx, or any custom log path

## SLO Tracking & Error Budgets

```
Tell Claude: "create an SLO for 99.9% uptime on PROD"
Tell Claude: "show me SLO status"
```

- Define uptime, latency, or error rate targets
- Track compliance from stored health/endpoint data
- Calculate error budget remaining and burn rate
- Configurable measurement windows (7d, 30d, 90d)

## CIS Benchmark & Vulnerability Scanning

```
Tell Claude: "run CIS benchmark on PROD"
Tell Claude: "scan for vulnerabilities on PROD"
```

- **61 CIS Linux Benchmark checks** across: filesystem, software updates, boot security, process hardening, network config, SSH, PAM, user management, logging, cron
- **CVE scanning** — lists installed packages, checks for security updates
- **Rootkit detection** — hidden processes, suspicious kernel modules, SUID files, crypto miners, suspicious cron jobs
- **File integrity monitoring** — hashes critical files, alerts on unauthorized changes

## Database Monitoring

```
Tell Claude: "monitor database on PROD"
```

- **MySQL**: slow query log, connection stats, replication lag, table sizes, processlist
- **PostgreSQL**: pg_stat_statements, connections, replication, table sizes, lock analysis, cache hit ratio
- Auto-detects which database is installed

## Network Monitoring

```
Tell Claude: "monitor network on PROD"
```

- Bandwidth per interface (bytes/sec, Mbps)
- Connection state tracking (ESTABLISHED, TIME_WAIT, CLOSE_WAIT)
- TCP retransmission rates
- Historical trends stored in SQLite

## Resource Rightsizing

```
Tell Claude: "rightsizing recommendations for PROD"
```

- Analyzes CPU, memory, disk usage over time
- Identifies over-provisioned resources ("CPU at 0.4% — downsize from 16 to 8 cores")
- Identifies under-provisioned resources ("Memory at 92% — upgrade RAM")
- Cost savings estimates

## Service Dependency Mapping

```
Tell Claude: "map dependencies on PROD"
```

- Parses active TCP connections to discover what processes talk to what
- Groups by process (nginx -> database:5432, app -> redis:6379)
- Stored in SQLite for historical tracking

## Root Cause Analysis

```
Tell Claude: "analyze root cause on PROD"
```

- Correlates metric spikes with service failures and alerts
- Detects cascading failure patterns
- Identifies resource exhaustion as cause of service crashes
- Temporal correlation across all monitoring data

## Smart Anomaly Detection

```
Tell Claude: "detect anomalies on PROD"
```

- Builds baselines per metric grouped by hour and day of week
- Flags values >2.5 standard deviations from the mean
- No ML dependencies — pure statistics from SQLite data

## Auto-Remediation Playbooks

**5 built-in playbooks:**
| Playbook | Trigger | Action |
|----------|---------|--------|
| `disk_cleanup` | Disk > 90% | Clear journal, /tmp, old logs, package cache |
| `restart_failed_services` | Failed services detected | Restart each failed service |
| `high_memory_cleanup` | Memory > 95% | Drop filesystem caches |
| `high_cpu_investigation` | CPU load > 3x cores | Log top CPU consumers |
| `ssl_renewal` | SSL cert < 7 days | Run certbot renew, reload nginx |

Custom playbooks: drop JSON files in `~/.server-guardian-mcp/playbooks/`

## Public Status Page

```
Tell Claude: "generate a status page"
```

- Self-hosted uptime page for customers
- Shows server and endpoint health
- Active incidents section
- Auto-refreshes every 60 seconds
- Replaces Better Stack ($29/mo) and Instatus ($20/mo) — free

## Multi-Step API Tests

```
Tell Claude: "test my API"
```

- Chain API calls: login -> extract token -> call API with token -> verify response
- Variable extraction from JSON responses
- Assertions: status code, body content, response time
- Save and re-run named tests

## Maintenance Windows

```
Tell Claude: "create maintenance window for PROD for 2 hours"
```

- Suppress alerts during planned work
- Configurable duration
- List and delete windows

## Compliance Reports

```
Tell Claude: "generate a compliance report for PROD"
```

- Security score (0-100) with letter grade (A-F)
- Detailed check results with pass/fail/warning badges
- Active alerts section
- Print-friendly, works in any browser
- Suitable for SOC2/ISO prep and client deliverables

## Team Mode (RBAC)

```env
GUARDIAN_TEAM_MODE=true
GUARDIAN_API_KEY=sg_your_api_key_here
```

| Role | Permissions |
|------|------------|
| **admin** | Full access — all tools, user management |
| **operator** | Run commands, restart services, deploy — no user management |
| **viewer** | Read-only — view health, logs, alerts, dashboards |

## External Integrations

```env
PAGERDUTY_ROUTING_KEY=your-routing-key
TELEGRAM_BOT_TOKEN=your-bot-token
TELEGRAM_CHAT_ID=your-chat-id
OPSGENIE_API_KEY=your-api-key
```

## Background Watchdog

Runs independently of Claude — no AI, no API cost. Monitors 24/7 and sends alerts via email, Slack, Discord.

```bash
python -m server_guardian_mcp watchdog           # run forever
python -m server_guardian_mcp watchdog --once    # run one cycle
```

### Alert thresholds
| Condition | Severity |
|-----------|----------|
| Disk > 90% | Critical |
| Disk > 80% | Warning |
| CPU load > 2x cores | Warning |
| Temperature > 85C | Warning |
| Server unreachable | Critical |
| Failed services | Warning |
| HTTP endpoint down | Critical |
| SSL cert < 7 days | Critical |
| SSL cert < 30 days | Warning |

## Connection Types

| Type | Connects to | Requires |
|------|------------|----------|
| `ssh` | Linux/Mac servers | paramiko (included) |
| `local` | Your own machine | nothing |
| `docker` | Docker containers | docker CLI |
| `winrm` | Windows servers | `pip install pywinrm` |
| `k8s` | Kubernetes pods | kubectl CLI |
| `aws-ssm` | AWS EC2 instances | aws CLI |
| `gcloud` | GCP Compute Engine | gcloud CLI |
| `azure` | Azure VMs | az CLI |

## Security

- **Command blocklist** — blocks rm -rf, fork bombs, reverse shells
- **Sensitive file protection** — blocks .pem, .key, .env, /etc/shadow
- **SQL safety** — read-only by default
- **Read-only mode** — `GUARDIAN_MODE=readonly`
- **Rate limiting** — 30 calls/min per tool
- **Audit logging** — all invocations logged with sensitive param redaction
- **Shell injection prevention** — shlex.quote on all inputs
- **Output capped at 512KB** per command
- **File integrity monitoring** — detect unauthorized file changes
- **CIS benchmark compliance** — 61 security checks
- **CVE + rootkit scanning** — detect known vulnerabilities and malware

## Architecture

- **63 MCP tools** across 16 modules
- **8 connection adapters** (SSH, Local, Docker, WinRM, K8s, AWS SSM, GCloud, Azure)
- **15 SQLite tables** (health, services, endpoints, alerts, audit, baselines, playbooks, users, logs, SLOs, file hashes, network, maintenance, API tests, dependencies)
- **Background watchdog** with email/Slack/Discord/PagerDuty/Telegram/OpsGenie alerts
- **Live web dashboard** (Starlette + Chart.js)
- **Statistical anomaly detection** engine
- **Auto-remediation** playbook engine
- **Access log APM** parser
- **CIS benchmark + CVE scanner**
- **Database monitoring** (MySQL + PostgreSQL)
- **Network monitoring** with bandwidth tracking
- **SLO tracking** with error budgets
- **Team RBAC** (admin/operator/viewer)
- **Compliance report** generator
- **Public status page** generator

## Requirements

- Python 3.10+
- `mcp>=1.0.0`
- `paramiko>=3.0.0`
- `uvicorn>=0.27.0`
- `starlette>=0.36.0`

## License

**Proprietary** — Copyright (c) 2026 Md Nazish Arman. All rights reserved.

Free for personal, non-commercial evaluation only. Commercial use, business use, or any revenue-generating use requires a paid license. See [LICENSE](LICENSE) for full terms.

## Author

**Md Nazish Arman**
