Metadata-Version: 2.4
Name: dc-overview
Version: 1.0.9
Summary: GPU Datacenter Monitoring Suite - Prometheus, Grafana & Exporters
Author-email: CryptoLabs <info@cryptolabs.co.za>
License: MIT
Project-URL: Homepage, https://github.com/cryptolabsza/dc-overview
Project-URL: Documentation, https://cryptolabs.co.za/dc-monitoring/
Project-URL: Repository, https://github.com/cryptolabsza/dc-overview
Project-URL: Issues, https://github.com/cryptolabsza/dc-overview/issues
Project-URL: Discord, https://discord.gg/7yeHdf5BuC
Keywords: gpu,datacenter,monitoring,prometheus,grafana,nvidia,dcgm,vast.ai
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: click>=8.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: questionary>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.31.0
Requires-Dist: jinja2>=3.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"

# DC Overview

[![PyPI](https://img.shields.io/pypi/v/dc-overview.svg)](https://pypi.org/project/dc-overview/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Complete GPU datacenter monitoring suite.** Monitor your GPU servers with Prometheus, Grafana, and optional AI-powered insights.

![Dashboard](docs/images/grafana-overview.png)

## ✨ What's Included

| Component | Description | Port |
|-----------|-------------|------|
| **Prometheus** | Time-series database for metrics | 9090 |
| **Grafana** | Beautiful dashboards and alerting | 3000 |
| **node_exporter** | CPU, RAM, disk, network metrics | 9100 |
| **dcgm-exporter** | NVIDIA GPU metrics (utilization, temp, power) | 9400 |
| **dc-exporter** | VRAM temperature, hotspot, fan speed | 9500 |
| **vastai-exporter** | Vast.ai earnings and reliability (optional) | 8622 |

---

## 🚀 Quick Start

### Prerequisites

- **Linux** (Ubuntu 20.04+, Debian, CentOS)
- **Python 3.9+** with pip
- **Root/sudo access** for installing services

### One Command Setup

**Ubuntu 24.04+ / Python 3.12+** (uses pipx):
```bash
sudo apt install pipx -y
pipx install dc-overview
pipx ensurepath && source ~/.bashrc
sudo dc-overview quickstart
```

**Ubuntu 22.04 / Python 3.10** (direct pip):
```bash
pip install dc-overview
sudo dc-overview quickstart
```

**Alternative** (if you get "externally-managed-environment" error):
```bash
pip install dc-overview --break-system-packages
sudo dc-overview quickstart
```

> **For remote worker deployment**, set up passwordless sudo on workers:
> ```bash
> sudo bash -c 'echo "YOUR_USER ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/nopasswd && chmod 440 /etc/sudoers.d/nopasswd'
> ```

The wizard guides you through everything:

```
╭──────────────────────────────────────────────────╮
│           DC Overview - Quick Setup              │
╰──────────────────────────────────────────────────╯

Step 1: What is this machine?
  ○ GPU Worker (has GPUs to monitor)
  ● Master Server (monitors other machines)
  ○ Both (has GPUs + monitors others)

Step 2: Setting up Monitoring Dashboard
  Set Grafana admin password: ******
  ✓ Prometheus running on port 9090
  ✓ Grafana running on port 3000

Step 3: Add Machines to Monitor
  How do you want to add servers?
    ● Import from file/paste (recommended)
    ○ Enter manually

  Paste your server list:
  global:root,mypassword
  192.168.1.101
  192.168.1.102
  192.168.1.103
  [Enter]

  Installing on 192.168.1.101... ✓
  Installing on 192.168.1.102... ✓
  Installing on 192.168.1.103... ✓
  ✓ Added 3 workers to Prometheus

Step 4: Vast.ai Integration (Optional)
  Are you a Vast.ai provider? [y/N]: y
  Vast.ai API Key: ******
  ✓ vastai-exporter running (port 8622)

✓ Setup Complete!
  Grafana: http://192.168.1.100:3000
```

---

## 📋 Import File Format

Create a simple text file to add many servers at once:

### Option 1: Global credentials (same for all)
```
global:root,mypassword
192.168.1.101
192.168.1.102
192.168.1.103
192.168.1.104
```

### Option 2: Per-server credentials
```
192.168.1.101,root,password1
192.168.1.102,ubuntu,password2
192.168.1.103,admin,password3
```

### Option 3: Mixed (global default + overrides)
```
global:root,defaultpass
192.168.1.101
192.168.1.102,ubuntu,custompass
192.168.1.103
```

---

## 🔧 Manual Installation

### On Master Server (monitoring hub)

```bash
pip install dc-overview
sudo dc-overview quickstart
# Select "Master Server"
```

### On GPU Workers

```bash
pip install dc-overview
sudo dc-overview quickstart
# Select "GPU Worker"
```

Or from the master, provide SSH credentials and the wizard installs remotely.

---

## 📊 Available Commands

```bash
dc-overview quickstart          # ⚡ One-command setup (recommended)
dc-overview status              # Check what's running
dc-overview add-machine IP      # Add another machine to monitor
dc-overview install-exporters   # Install exporters on current machine
dc-overview setup-ssl           # Set up reverse proxy with SSL
```

---

## 🔒 Reverse Proxy & SSL Setup

Set up a secure HTTPS frontend with a branded landing page:

### Self-Signed Certificate (Default)

```bash
# Basic setup (IP access only)
sudo dc-overview setup-ssl

# With custom site name
sudo dc-overview setup-ssl --site-name "My GPU Farm"

# Include IPMI Monitor
sudo dc-overview setup-ssl --ipmi --vastai
```

### Let's Encrypt (Free SSL)

For a valid SSL certificate (no browser warnings):

```bash
sudo dc-overview setup-ssl \
  --domain monitor.example.com \
  --letsencrypt \
  --email admin@example.com \
  --ipmi --vastai
```

### DNS Setup (Required for Domain)

Add these DNS records pointing to your server IP:

| Type | Name | Value | Purpose |
|------|------|-------|---------|
| A | `monitor.example.com` | `<server-ip>` | Main dashboard |
| A | `grafana.monitor.example.com` | `<server-ip>` | Grafana subdomain (optional) |
| A | `ipmi.monitor.example.com` | `<server-ip>` | IPMI subdomain (optional) |

### After Setup

Access your monitoring at:

```
https://<server-ip>/           # Landing page
https://<server-ip>/grafana/   # Grafana dashboards
https://<server-ip>/prometheus/# Prometheus UI
https://<server-ip>/ipmi/      # IPMI Monitor (if enabled)
```

Or with domain:
```
https://monitor.example.com/
https://grafana.monitor.example.com/  (if subdomain configured)
```

---

## 🐳 Docker Alternative

If you prefer Docker Compose:

```yaml
# docker-compose.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    ports: ["9090:9090"]
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
```

---

## 🔗 Related Tools

| Tool | Purpose | Install |
|------|---------|---------|
| [IPMI Monitor](https://github.com/cryptolabsza/ipmi-monitor) | Server health, SEL logs, ECC errors | `pip install ipmi-monitor` |
| [dc-exporter](https://github.com/cryptolabsza/dc-exporter) | GPU VRAM temperatures | Included in quickstart |

---

## 📖 Full Suite Setup (Master + Workers)

For a complete datacenter setup with IPMI monitoring:

### 1. On Master Server
```bash
# Install dc-overview (Grafana + Prometheus)
pip install dc-overview
sudo dc-overview quickstart
# Select "Master Server", add your workers

# Install ipmi-monitor (optional - for BMC/IPMI)
pip install ipmi-monitor
sudo ipmi-monitor quickstart
```

### 2. Workers are configured automatically
The quickstart installs exporters on workers via SSH.

### 3. Import your servers
Create `servers.txt`:
```
global:root,sshpassword
192.168.1.101
192.168.1.102
192.168.1.103
```

Then paste when prompted, or run:
```bash
dc-overview add-machine 192.168.1.101 --ssh-pass mypassword
```

---

## 💬 Support

- **Discord**: https://discord.gg/7yeHdf5BuC
- **Issues**: https://github.com/cryptolabsza/dc-overview/issues

---

## 📄 License

MIT License - see [LICENSE](LICENSE) for details.
