Metadata-Version: 2.4
Name: nomad-hpc
Version: 1.4.0
Summary: A lightweight HPC monitoring and predictive analytics tool
Author-email: João Tonini <jtonini@richmond.edu>
Maintainer-email: João Tonini <jtonini@richmond.edu>
License: AGPL-3.0-or-later
Project-URL: Homepage, https://nomad-hpc.com
Project-URL: Documentation, https://jtonini.github.io/nomad-hpc/
Project-URL: Repository, https://github.com/jtonini/nomad-hpc
Project-URL: Issues, https://github.com/jtonini/nomad-hpc/issues
Keywords: hpc,monitoring,slurm,cluster,predictive-analytics,machine-learning,anomaly-detection,graph-neural-network
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: System Administrators
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Monitoring
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: click>=8.0
Requires-Dist: toml>=0.10
Requires-Dist: numpy>=1.21
Requires-Dist: pandas>=1.3
Requires-Dist: scipy>=1.7
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.0; extra == "ml"
Requires-Dist: torch>=2.0; extra == "ml"
Requires-Dist: torch-geometric>=2.0; extra == "ml"
Provides-Extra: dashboard
Requires-Dist: jinja2>=3.0; extra == "dashboard"
Provides-Extra: alerts
Provides-Extra: all
Requires-Dist: nomad[dashboard,ml]; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: pre-commit>=3.0; extra == "dev"

# NØMAD-HPC

**NØde Monitoring And Diagnostics** — Lightweight HPC monitoring, visualization, and predictive analytics.

> *"Travels light, adapts to its environment, and doesn't need permanent infrastructure."*

[![PyPI](https://img.shields.io/pypi/v/nomad-hpc.svg)](https://pypi.org/project/nomad-hpc/)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.18614517.svg)](https://doi.org/10.5281/zenodo.18614517)

---

📖 **[Full Documentation](https://jtonini.github.io/nomad-hpc/)** — Installation guides, configuration, CLI reference, network methodology, ML framework, and more.

---

## Quick Start
```bash
pip install nomad-hpc
nomad demo                    # Try with synthetic data
```

For production:
```bash
nomad init                    # Configure for your cluster
nomad collect                 # Start data collection
nomad dashboard               # Launch web interface
```

---

## Features

| Feature | Description | Command |
|---------|-------------|---------|
| **Dashboard** | Real-time multi-cluster monitoring with partition views | `nomad dashboard` |
| **Workstation Monitoring** | Track departmental workstations (CPU, memory, disk, users) | Dashboard → Workstations |
| **Storage Monitoring** | Monitor NFS servers, ZFS pools, IOPS, and client connections | Dashboard → Storage |
| **Interactive Sessions** | Monitor RStudio/Jupyter sessions with memory and age | Dashboard → Interactive |
| **Data Readiness** | Assess ML model readiness with sample size and variance analysis | `nomad readiness` |
| **Diagnostics** | Analyze network, storage, and node-level bottlenecks | `nomad diag` |
| **Educational Analytics** | Track computational proficiency development | `nomad edu explain <job>` |
| **Alerts** | Threshold + predictive alerts (email, Slack, webhook) | `nomad alerts` |
| **ML Prediction** | Job failure prediction using similarity networks | `nomad predict` |
| **Insight Engine** | Operational narratives from multi-signal analysis | `nomad insights brief` |
| **Cloud Monitoring** | AWS/Azure/GCP metrics with cost and utilization analysis | `nomad cloud status` |
| **Community Export** | Anonymized datasets for cross-institutional research | `nomad community export` |
| **System Dynamics** | Ecological and economic metrics for resource analysis | `nomad dyn` |
| **Reference** | Built-in documentation, code navigation, and search | `nomad ref` |
| **Developer Toolchain** | Scaffolding, validation, and contribution pipeline | `nomad dev` |
| **Issue Reporting** | Submit bugs, features, questions from any interface | `nomad issue report` |

---


### Developer Toolchain

Scaffolding, codebase validation, and contribution pipeline for NØMAD development.

```bash
nomad dev guide                # Interactive contribution wizard
nomad dev new collector zfs    # Scaffold a new module
nomad dev check                # Validate codebase health
nomad dev check --fix          # Auto-fix registration issues
nomad dev test changed         # Test only modified files
nomad dev status               # Current branch and readiness
nomad dev submit               # Full contribution pipeline
nomad dev setup                # One-time dev environment config
nomad dev bump patch           # Version management
nomad dev deps collector disk  # Module dependency graph
```

Supports 8 module types: collector, command, analysis, metric, view, page, alert, insight.
Every scaffolded module includes source file, test stubs, schema/config templates,
and next-step instructions. Quality by construction — not by review.

## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│                              NØMAD                                  │
├───────────────┬───────────────┬───────────────┬─────────────────────┤
│  Collectors   │   Analysis    │     Viz       │  Alerts   │  Intelligence  │
├───────────────┼───────────────┼───────────────┼───────────┼────────────────┤
│ disk          │ derivatives   │ dashboard     │ thresholds│ insights       │
│ iostat        │ similarity    │ network 3D    │ predictive│ dynamics       │
│ nfs           │ community     │ partitions    │ flapping  │ reference      │
│ slurm         │ ML ensemble   │ workstations  │ email     │ edu scoring    │
│ gpu           │ readiness     │ storage       │ slack     │                │
│ workstation   │ diagnostics   │ interactive   │ webhooks  │                │
│ storage       │               │               │           │                │
│ cloud         │               │               │           │                │
└───────────────┴───────────────┴───────────────┴───────────┴────────────────┘
                                │
                      ┌─────────┴─────────┐
                      │  SQLite Database  │
                      └───────────────────┘
```

---

## CLI Reference

### Core Commands
```bash
nomad init                    # Setup wizard
nomad collect                 # Start collectors
nomad dashboard               # Web interface
nomad dashboard --db file.db  # Use specific database
nomad demo                    # Demo mode with synthetic data
nomad status                  # System status
```

### Data Readiness & Diagnostics
```bash
nomad readiness               # Check ML training readiness
nomad readiness -v            # Verbose with feature details
nomad diag network            # Network performance analysis
nomad diag storage            # Storage health and I/O patterns
nomad diag node               # Node-level resource bottlenecks
```

### Educational Analytics
```bash
nomad edu explain <job_id>    # Job analysis with recommendations
nomad edu trajectory <user>   # User proficiency over time
nomad edu report <group>      # Course/group report
```

### Analysis & Prediction
```bash
nomad disk /path              # Filesystem trends
nomad jobs --user <user>      # Job history
nomad similarity              # Network analysis
nomad train                   # Train ML models
nomad predict                 # Run predictions
```

### Community & Alerts
```bash
nomad community export        # Export anonymized data
nomad community preview       # Preview export
nomad alerts                  # View alerts
nomad alerts --unresolved     # Unresolved only
```


### System Dynamics
```bash
nomad dyn summary             # Full dynamics narrative
nomad dyn diversity           # Workload diversity indices
nomad dyn diversity --by partition  # By partition
nomad dyn niche               # Resource overlap between groups
nomad dyn capacity            # Carrying capacity, binding constraint
nomad dyn resilience          # Recovery time after disturbances
nomad dyn externality         # Inter-group impact scoring
```

### Insight Engine
```bash
nomad insights brief          # Executive summary
nomad insights full           # Comprehensive report
nomad insights signals        # Raw signal detection
nomad insights correlations   # Cross-signal analysis
nomad insights enrich         # Alert enrichment with context
```

### Reference
```bash
nomad ref                     # Browse all 60 topics
nomad ref dyn diversity       # Look up any topic
nomad ref search "regime"     # Search across documentation
nomad ref alerts thresholds   # Alert threshold reference
nomad ref config              # Configuration reference
```

### Issue Reporting
```bash
nomad issue report            # Interactive bug/feature/question form
nomad issue report -c bug -m alerts  # Pre-select category and component
nomad issue report --email    # Send via email instead of GitHub
nomad issue search disk       # Search existing issues
nomad issue info              # Preview auto-collected system info
nomad issue info --json       # JSON output for scripting
```
---

## Dashboard Views

The web dashboard includes multiple views accessible via tabs:

- **Cluster Overview**: Real-time node status with health rings showing CPU utilization
- **Network View**: 3D job similarity network with failure clustering analysis
- **Resources**: CPU-hours, GPU-hours, and usage breakdown by group/user
- **Activity**: Job submission heatmap showing patterns by day and hour
- **Interactive**: Active RStudio and Jupyter sessions with memory usage
- **Workstations**: Departmental machines with CPU, memory, disk, and logged-in users
- **Storage**: NFS servers with ZFS pool health, capacity, and client connections
- **Cloud**: AWS, Azure, and GCP resource utilization and cost tracking
- **Insights**: Operational narratives from multi-signal analysis
- **Dynamics**: Ecological and economic metrics (diversity, niche, capacity, resilience)
- **Report Issue**: Submit bugs, feature requests, and questions with auto-populated system info

Toggle between light and dark themes with the Theme button.

---

## Installation

### From PyPI
```bash
pip install nomad-hpc
```

### From Source
```bash
git clone https://github.com/jtonini/nomad-hpc
cd nomad-hpc && pip install -e .
```

### Requirements
- Python 3.9+
- SQLite 3.35+
- sysstat package (`iostat`, `mpstat`)
- Optional: SLURM, nvidia-smi, nfsiostat

### System Check
```bash
nomad syscheck
```

---

## Documentation

📖 **[jtonini.github.io/nomad-hpc](https://jtonini.github.io/nomad-hpc/)**

- [Installation & Configuration](https://jtonini.github.io/nomad-hpc/installation/)
- [System Install (`--system`)](https://jtonini.github.io/nomad-hpc/system-install/)
- [Dashboard Guide](https://jtonini.github.io/nomad-hpc/dashboard/)
- [Educational Analytics](https://jtonini.github.io/nomad-hpc/edu/)
- [Network Methodology](https://jtonini.github.io/nomad-hpc/network/)
- [ML Framework](https://jtonini.github.io/nomad-hpc/ml/)
- [Proficiency Scoring](https://jtonini.github.io/nomad-hpc/proficiency/)
- [CLI Reference](https://jtonini.github.io/nomad-hpc/cli/)
- [Configuration Options](https://jtonini.github.io/nomad-hpc/config/)
  - [Issue Reporting](https://jtonini.github.io/nomad-hpc/issue/)
  - [System Dynamics](https://jtonini.github.io/nomad-hpc/dynamics/)
  - [Cloud Monitoring](https://jtonini.github.io/nomad-hpc/cloud/)
  - [Reference System](https://jtonini.github.io/nomad-hpc/reference/)

---

## License

Dual-licensed:
- **AGPL v3** — Free for academic, educational, and open-source use
- **Commercial License** — Available for proprietary deployments

---

## Citation
```bibtex
@software{nomad2026,
  author = {Tonini, João Filipe Riva},
  title = {NØMAD: Lightweight HPC Monitoring with Machine Learning-Based Failure Prediction},
  year = {2026},
  url = {https://github.com/jtonini/nomad-hpc},
  doi = {10.5281/zenodo.18614517}
}

@article{tonini2026nomad,
  author = {Tonini, João Filipe Riva},
  title = {NØMAD: Lightweight HPC Monitoring with Machine Learning-Based Failure Prediction},
  journal = {Journal of Open Research Software},
  volume = {14},
  pages = {17},
  year = {2026},
  doi = {10.5334/jors.686}
}
```

---

## Contributing

See [CONTRIBUTING.md](docs/CONTRIBUTING.md) for guidelines.

---

## Contact

- **Author**: João Tonini
- **Email**: jtonini@richmond.edu
- **Issues**: [GitHub Issues](https://github.com/jtonini/nomad-hpc/issues)