Metadata-Version: 2.4
Name: hpc-stats-scripts
Version: 1.3.3
Summary: Utilities for HPC clusters including PBS/Slurm job statistics and a psutil-based (CPU/GPU/Memory) resource monitor.
Author: hpc-stats-scripts contributors
License-Expression: MIT
Keywords: hpc,pbs,slurm,monitoring,cluster,psutil
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: psutil
Provides-Extra: plot
Requires-Dist: matplotlib; extra == "plot"
Requires-Dist: numpy; extra == "plot"
Provides-Extra: gpu
Requires-Dist: nvidia-ml-py3; extra == "gpu"
Provides-Extra: all
Requires-Dist: matplotlib; extra == "all"
Requires-Dist: numpy; extra == "all"
Requires-Dist: nvidia-ml-py3; extra == "all"
Dynamic: license-file
Dynamic: requires-python

# hpc-stats-scripts

Utilities for working with high-performance computing (HPC) environments. The scripts
help inspect PBS/Slurm job efficiency and monitor CPU/GPU and memory usage on a
running system or process tree.

Made with Codex :)

## Dependencies

Install the required Python packages with pip:

| Feature | Packages | Install command |
| ------- | -------- | ---------------- |
| Core utilities | psutil | `pip install psutil` |
| Plotting for `psutil-monitor` | matplotlib, numpy | `pip install matplotlib numpy` |
| GPU monitoring for `psutil-monitor --gpu` | nvidia-ml-py3 (pynvml) | `pip install nvidia-ml-py3` |
| Plot + GPU combo | psutil, matplotlib, numpy, nvidia-ml-py3 | `pip install psutil matplotlib numpy nvidia-ml-py3` |
| All extras via pip extras | plot + GPU | `pip install .[all]` |

The `pbs-bulk-user-stats` command also expects the PBS `qstat` utility to be
available in your environment.
The `slurm-bulk-user-stats` command expects Slurm's `sacct` utility to be
available in your environment.

## Installation

Clone the repository and install with pip:

```bash
Install from PyPI (once published):
```bash
pip install hpc-stats-scripts
pip install hpc-stats-scripts[plot]
pip install hpc-stats-scripts[gpu]
pip install hpc-stats-scripts[all]
```

Or install directly from GitHub (the latest version):
```bash
pip install "hpc-stats-scripts @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
pip install "hpc-stats-scripts[plot] @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
pip install "hpc-stats-scripts[gpu] @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
pip install "hpc-stats-scripts[all] @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
```

The base installation depends on [psutil](https://pypi.org/project/psutil/).
The `plot` extra pulls in `matplotlib` and `numpy` for the `--plot` feature of `psutil-monitor`.
The `gpu` extra installs `nvidia-ml-py3` to enable `--gpu`.

## CLI tools

### `pbs-bulk-user-stats`

Summarize CPU and memory usage for PBS jobs and show which nodes the jobs are
allocated to. The command relies on `qstat` being available in your `PATH`.
The table now includes `NGPUS` (requested GPUs) when present.

Examples:

```bash
# Summarize a specific job and write CSV output
pbs-bulk-user-stats --job 12345 --csv stats.csv

# Summarize all jobs for the current user (default)
pbs-bulk-user-stats --include-finished

# Summarize all jobs for a specific user
pbs-bulk-user-stats --user myuser --include-finished

# Include finished jobs but only show the 20 most recent
pbs-bulk-user-stats --user myuser --include-finished 20
```

When invoked with no `--user` or `--job` options:
- On a login node (no `$PBS_JOBID` present), it summarizes all jobs for the current user.
- Inside a running PBS job (where `$PBS_JOBID` is set), it automatically summarizes that specific job.

```
pbs-bulk-user-stats
```

State codes (PBS):
- `R` running, `Q` queued/waiting, `X` finished (requires `--include-finished`), other codes are printed under “other” in the summary.

**Expected output (CPU/RAM only):**
```
$ pbs-bulk-user-stats

JOBID    STATE   NAME       NODES    NCPUS  WALL(h)  CPUT(h)  avgCPU  CPUeff  memUsed   memReq   memEff
-------------------------------------------------------------------------------------------------------
0001      R      run1		pbs-1    176    38.55    3632.12  163.6  93.53%  207.4 GiB 256.00 GiB 81.10%
0002      R      run2		pbs-2    176    38.59    3589.72  93.13  52.91%  50.02 GiB 256.00 GiB 19.54%
...
Summary:
  jobs:         5
  unique nodes: 3
  states:       R=4  Q=1  X=0  other=0
  mean CPUeff:  75.20%
  mean avgCPU:  132.35
  mean memEff:  82.50%
  max memUsed:  230.16 GiB

```
or if run inside a running PBS:
```
JOBID  STATE  NAME   NODES  NCPUS  WALL(h)  CPUT(h)  avgCPU  CPUeff  memUsed     memReq     memEff
-----------------------------------------------------------------------------------------------------
0001   R      STDIN  pbs-5  100    0.03     0.01     0.22    0.22%   666.58 MiB  30.00 GiB  2.17% 

Summary:
  jobs:        1
  mean CPUeff: 0.22%
  mean avgCPU: 0.22
  mean memEff: 2.17%
  max memUsed: 666.58 MiB

```

After the table, a summary reports the job count, mean CPU efficiency,
mean average CPU usage, mean memory efficiency, and the peak memory used
across all listed jobs.

### `psutil-monitor`

Real-time CPU and memory monitor for the system or a process tree.
Use `--gpu` to also report aggregate GPU utilization and memory via NVML (requires `nvidia-ml-py3`).
When `--csv`/`--plot` are used, metrics stream live to the terminal during the run; CSV/PNG files are written when the monitor exits (Ctrl+C, duration reached, or proc tree ends).

Example output files (generated with `--plot` and `--csv`):

- Plot (CPU + GPU stacked):

  ![psutil-monitor example plot](docs/psutil-monitor-example.jpg)

- CSV: `docs/psutil-monitor-example.csv`

GPU output fields (when `--gpu` is used):
- **GPU util**: Average utilization across visible GPUs.
- **busyGPUs**: Sum of utilization fractions (e.g., two GPUs at 50% each → 1.0).
- **GPU mem %**: Aggregate GPU memory usage percentage.
- **Per-GPU** (CSV `gpu_pergpu`): `index:util%/used/total` for each device.

Examples:

```bash
# System-wide (by default) monitoring with console output only
psutil-monitor

# System-wide monitoring with CSV and PNG output
psutil-monitor --mode system --csv node.csv --plot node.png

# Monitor the current process tree (useful inside a PBS job)
psutil-monitor --mode proc --pid $$ --include-children --csv job.csv

# For script.py resources monitoring:
python script.py &                   # launch the workload
target=$!                            # PID of script.py
echo $target
# psutil-monitor watches that PID and exits when the process tree is gone
psutil-monitor --mode proc --pid "$target" --include-children --csv stat.csv --plot plot.png

```
**Expected output:**
```
$ psutil-monitor

CPUs available (affinity): 384
Total memory available: 754.76 GiB
CPU basis for %: 384
Memory basis for %: 754.76 GiB
2025-08-14T15:20:14  CPU  79.67%  busyCPUs 305.93  (provided 384)  MEM   9.93%  used 74.96 GiB / total 754.76 GiB
2025-08-14T15:20:16  CPU  69.30%  busyCPUs 266.13  (provided 384)  MEM   9.95%  used 75.12 GiB / total 754.76 GiB
2025-08-14T15:20:18  CPU  61.34%  busyCPUs 235.53  (provided 384)  MEM  10.05%  used 75.82 GiB / total 754.76 GiB
2025-08-14T15:20:20  CPU  61.32%  busyCPUs 235.47  (provided 384)  MEM  10.09%  used 76.15 GiB / total 754.76 GiB
2025-08-14T15:20:22  CPU  74.57%  busyCPUs 286.33  (provided 384)  MEM   9.94%  used 74.99 GiB / total 754.76 GiB
2025-08-14T15:20:24  CPU  85.94%  busyCPUs 330.01  (provided 384)  MEM   9.86%  used 74.44 GiB / total 754.76 GiB
Average busy CPUs over run: 276.570
Peak memory (system): 76.15 GiB

```
With GPUs (`--gpu` and NVIDIA GPUs present):
```
$ psutil-monitor --gpu

CPUs available (affinity): 96
Total memory available: 503.70 GiB
CPU basis for %: 96
Memory basis for %: 503.70 GiB
GPUs detected (NVML): 4
2026-02-03T10:00:14  CPU  45.12%  busyCPUs 43.32  (provided 96)  MEM   8.10%  used 40.80 GiB / total 503.70 GiB  GPU util  57.5% busyGPUs 2.30 mem  42.0%
2026-02-03T10:00:16  CPU  48.33%  busyCPUs 46.39  (provided 96)  MEM   8.20%  used 41.30 GiB / total 503.70 GiB  GPU util  63.0% busyGPUs 2.52 mem  44.1%
2026-02-03T10:00:18  CPU  52.10%  busyCPUs 49.99  (provided 96)  MEM   8.25%  used 41.60 GiB / total 503.70 GiB  GPU util  68.7% busyGPUs 2.75 mem  45.3%
Average busy CPUs over run: 46.567
Average busy GPUs over run: 2.523
Peak memory (system): 41.60 GiB

```

Use the `--help` option of each command to see all available options.

### `slurm-bulk-user-stats`

Summarize CPU and memory usage for Slurm jobs and show which nodes the jobs are
allocated to. The command relies on `sacct` being available in your `PATH`.
The table includes `NGPUS` based on AllocTres/AllocGRES when present.
If TRES GPU usage metrics are available, the summary also reports mean GPU util and GPU hours (used/requested).

State codes (Slurm):
- `R`/`RUNNING`, `PD`/`PENDING`, `CD`/`COMPLETED`; other states (e.g., `F`, `CG`, `S`, `TO`) are grouped under “other” in the summary and listed in the breakdown.

Examples:

```bash
# Summarize a specific job and write CSV output
slurm-bulk-user-stats --job 12345 --csv stats.csv

# Summarize all running jobs for the current user (default)
slurm-bulk-user-stats

# Summarize all jobs (including finished) for a specific user
slurm-bulk-user-stats --user myuser --include-finished

# Include finished jobs but only show the 20 most recent
slurm-bulk-user-stats --user myuser --include-finished 20
```

When invoked with no `--user` or `--job` options:
- On a login node (no `$SLURM_JOB_ID` present), it summarizes pending/running jobs for the current user.
- Inside a running Slurm job (where `$SLURM_JOB_ID` is set), it automatically summarizes that specific job.

```
slurm-bulk-user-stats
```

The output mirrors the PBS version, showing job state, node list, CPU/memory
usage, efficiency metrics, and a summary block with job counts and averages.
