Metadata-Version: 2.4
Name: denpex-sdk
Version: 0.1.0
Summary: Denpex training SDK for SDC detection and checkpoint validation.
Author: Denpex
License: Proprietary
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: torch
Requires-Dist: torch>=2.1; extra == "torch"

# denpex-sdk

Zero-third-party Python helpers for SDC detection (per-layer weight/gradient
norms, NaN/Inf checks, optimizer state validation) and asynchronous
checkpoint validation (manifest integrity + content checksums) for S3/Lustre.

## SDC detection

```python
from denpex_sdk import SdcHook, install_sdc_hooks

hook = install_sdc_hooks(model, SdcHook(max_weight_norm=1e4, max_grad_norm=1e4))
weight_report = hook.inspect()
grad_report = hook.inspect_gradients()
opt_report = hook.inspect_optimizer(optimizer)
```

## Checkpoint validation

```python
from denpex_sdk import build_manifest, manifest_to_json, manifest_from_json, validate_checkpoint

manifest = build_manifest(["/checkpoints/step-1000/"], metadata={"job_id": "..."})
with open("/checkpoints/step-1000/manifest.json", "w") as handle:
    handle.write(manifest_to_json(manifest))

with open("/checkpoints/step-1000/manifest.json") as handle:
    expected = manifest_from_json(handle.read())

result = validate_checkpoint(["/checkpoints/step-1000/"], expected_manifest=expected)
assert result.valid
```
