Metadata-Version: 2.4
Name: afrilink-sdk
Version: 0.8.7
Summary: AfriLink SDK — One-line access to GPUs, models and datasets from your notebook
Home-page: https://github.com/dataspires/afrilink-sdk
Author: DataSpires
Author-email: DataSpires <info@dataspires.com>
License-Expression: MIT
Project-URL: Homepage, https://dataspires.com
Project-URL: Documentation, https://www.dataspires.com/docs
Project-URL: Repository, https://github.com/DataSpires/afrilink-sdk
Project-URL: Bug Tracker, https://github.com/DataSpires/afrilink-sdk/issues
Keywords: hpc,high-performance-computing,finetuning,llm,lora,notebook,gpu,slurm,afrilink
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Distributed Computing
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: full
Requires-Dist: requests>=2.28.0; extra == "full"
Requires-Dist: psutil>=5.9.0; extra == "full"
Provides-Extra: build
Requires-Dist: requests>=2.28.0; extra == "build"
Requires-Dist: cryptography>=41.0.0; extra == "build"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# AfriLink SDK

**Version:** 0.8.7

**Last Updated:** May 28, 2026

**Train & Finetune on a Dedicated A100 from Any Notebook**

AfriLink SDK gives you one-line access to a dedicated **NVIDIA A100 80GB** for **training and finetuning** across text, vision and multimodal models. Works on **Google Colab, Kaggle, Jupyter, VS Code**, and any Python environment.

| Capability | API | What It Does |
|------------|-----|-------------|
| **Training** | `client.train()` | Run any training script (YOLOv8, custom PyTorch, etc.) on the A100 |
| **Finetuning** | `client.finetune()` | LoRA/QLoRA LLM fine-tuning with one line |

```
pip install afrilink-sdk
```

---

## Quick Start — Finetune an LLM

```python
from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()   # prompts for DataSpires email/password

import pandas as pd
data = pd.DataFrame({"text": [
    "Below is an instruction...\n\n### Response:\nHere is the answer..."
]})

job = client.finetune(
    model="qwen2.5-0.5b",
    training_mode="low",
    data=data,
    gpus=1,
    time_limit="01:00:00",
)
result = job.run(wait=True)

if result["status"] == "completed":
    client.download_model(result["job_id"], "./my-model")
```

## Quick Start — Train a Vision Model

```python
from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()

# Submit a YOLOv8 training job to an A100 GPU
job = client.train(
    script="train_yolo.py",        # your training script
    container="afrilink-yolo",      # pre-built container with YOLOv8 + PyTorch
    data="./dataset.tar.gz",        # dataset (uploaded automatically)
    data_config="dataset.yaml",     # config file (e.g. YOLO dataset.yaml)
    gpus=1,
    time_limit="02:00:00",
)
result = job.run(wait=True)

# Check logs and download results
print(job.get_logs(tail=50))
client.transfer.download_file("$WORK/runs/best.pt", "./best.pt")
```

---

## Installation

```bash
pip install afrilink-sdk
```

The package has **zero required dependencies** — heavy libraries (requests, torch, transformers, peft) are only needed at the point you actually use them and are pre-installed in most notebook environments.

---

## Authentication

AfriLink uses a two-phase auth flow. Both phases happen inside a single `client.authenticate()` call:

| Phase | What happens | User action |
|-------|-------------|-------------|
| **1. DataSpires** | Validates your DataSpires account for billing/telemetry | Enter email + password when prompted |
| **2. HPC** | Headless Selenium browser automation gets SSH certificates via Smallstep | Fully automatic (org credentials auto-provisioned) |

```python
from afrilink import AfriLinkClient

client = AfriLinkClient()
client.authenticate()   # prompts for DataSpires creds, then auto-handles HPC

# Or pass credentials explicitly:
client.authenticate(
    dataspires_email="you@example.com",
    dataspires_password="...",
)
```

After authentication you get:
- SSH certificate valid for ~12 hours (the SDK warns you before it expires — see [Session Recovery](#session-recovery))
- SLURM job manager ready to submit jobs
- SCP transfer manager ready to move files
- Telemetry tracker logging GPU-minutes to your DataSpires account

---

## Built-in User Guide

The SDK ships with an inline reference manual you can query from any notebook cell using a slash-style syntax:

```python
import afrilink

afrilink/help          # top-level index of all topics
afrilink/quickstart    # step-by-step getting started guide
afrilink/auth          # authentication & session management
afrilink/finetune      # finetune job parameters & training modes
afrilink/specs         # available models and GPU requirements
afrilink/datasets      # dataset formats and upload
afrilink/transfer      # SCP upload/download commands
afrilink/jobs          # SLURM job management
afrilink/inference     # routing inference to HuggingFace endpoints
```

Each page prints a formatted reference to your notebook output — no internet connection required.

---

## API Reference

### `AfriLinkClient`

Main entry point. Created once per notebook session.

| Method | Description |
|--------|-------------|
| `authenticate()` | Full auth flow (DataSpires + HPC) |
| `train(script, container, data, gpus, ...)` | Create a `TrainJob` (general-purpose training) |
| `finetune(model, training_mode, data, gpus, ...)` | Create a `FinetuneJob` (LLM fine-tuning) |
| `inference(prompt, model_id, ...)` | Route inference to a HuggingFace endpoint |
| `download_model(job_id, local_dir)` | Download trained adapter weights |
| `upload_dataset(local_path, dataset_name)` | Upload dataset to HPC |
| `list_containers()` | List available training containers |
| `list_available_models(size=None)` | List models in the registry |
| `list_available_datasets()` | List datasets in the registry |
| `get_model_requirements(model, training_mode)` | GPU/memory recommendations |
| `list_jobs()` | List SLURM queue |
| `recover_session(download_dir=None)` | Re-authenticate + check/download tracked jobs |
| `cancel_job(job_id)` | Cancel a running job |
| `run_command(command)` | Run arbitrary shell command on HPC login node |
| `get_queue_status()` | SLURM partition info |
| `cert_minutes_remaining` | Minutes until SSH certificate expires |

### `client.train()`

General-purpose training on HPC. Use this for any training framework (YOLOv8, custom PyTorch, etc.). For LoRA/QLoRA LLM fine-tuning, use `client.finetune()` instead.

```python
job = client.train(
    script="train_yolo.py",        # local Python script to upload and run
    container="afrilink-yolo",      # pre-built container (see Available Containers)
    data="./dataset/",              # local path, archive, DataFrame, or remote HPC path
    data_config="dataset.yaml",     # config file (e.g. YOLO dataset.yaml)
    gpus=1,                         # number of A100 GPUs (1-32)
    time_limit="04:00:00",          # max wallclock (HH:MM:SS)
    script_args="--epochs 100",     # CLI arguments passed to your script
    extra_files=["weights.pt"],     # additional files to upload
    memory_gb=64,                   # RAM per node (default: 64)
    container_env={"KEY": "val"},   # extra environment variables
)
```

**Available Containers:**

| Name | Frameworks | Use Case |
|------|-----------|----------|
| `afrilink-yolo` | Ultralytics, PyTorch, torchvision | Object detection, segmentation, pose estimation |
| `afrilink-finetune` | PyTorch, Transformers, PEFT, bitsandbytes | LLM fine-tuning |

Alternatively, pass a full SIF path as `container=` to use your own Singularity image.

**Data handling:**

| Input type | What happens |
|------------|-------------|
| Local directory | Uploaded via SCP |
| `.tar.gz` / `.zip` archive | Uploaded and extracted on HPC |
| Single file | Uploaded to job directory |
| `pandas.DataFrame` | Serialised to JSONL, uploaded |
| Path starting with `$` or `/` | Treated as remote HPC path (no upload) |

**`script_content=` alternative:** Instead of a file path, pass inline Python code as a string:

```python
job = client.train(
    script_content="from ultralytics import YOLO\nYOLO('yolov8n.pt').train(data='coco.yaml')",
    container="afrilink-yolo",
    gpus=1,
)
```

### `TrainJob`

Returned by `client.train()`.

| Method / Property | Description |
|-------------------|-------------|
| `run(wait=True)` | Submit to SLURM. `wait=True` polls until done. |
| `cancel()` | Cancel the SLURM job |
| `get_logs(tail=100)` | Fetch recent log lines |
| `estimated_cost_usd()` | Estimate max cost based on GPUs and time limit |
| `status` | Current status string |
| `job_id` | AfriLink job ID (8-char UUID prefix) |
| `slurm_job_id` | SLURM numeric job ID (set after `run()`) |

`run()` returns a dict:

```python
{
    "job_id": "a1b2c3d4",
    "slurm_job_id": "12345678",
    "status": "completed",        # or "submitted" if wait=False
    "output_dir": "/path/...",
}
```

---

### `client.finetune()`

```python
job = client.finetune(
    model="qwen2.5-0.5b",       # model ID from registry
    training_mode="low",          # "low" | "medium" | "high"
    data=my_dataframe,            # pandas DataFrame, HF Dataset, or file path
    gpus=1,                       # number of A100 GPUs
    time_limit="01:00:00",        # max wallclock (HH:MM:SS)
    backend="cineca",             # HPC backend cluster
    output_dir=None,              # default: $WORK/finetune_outputs
)
```

**HPC Backends:**

| Backend | Provider | Region | Status |
|---------|----------|--------|--------|
| `cineca` | CINECA Leonardo (EuroHPC) | Bologna, Italy | Available (default) |
| `eversetech` | EverseTech | Variable | Coming soon |
| `agh` | AGH | Variable | Coming soon |
| `acf` | ACF | Variable | Coming soon |

**CINECA Leonardo — Hardware Specs:**

Each GPU node on the Leonardo Booster partition (where AfriLink jobs run):

| Component | Specification |
|-----------|---------------|
| GPU per node | 4x NVIDIA A100 (custom) |
| GPU memory | 64 GB HBM2e per GPU (256 GB per node) |
| FP64 performance | 11.2 TFLOPS per GPU |
| FP32 performance | 22.4 TFLOPS per GPU |
| CPU cores per node | 32 |
| System RAM per node | 512 GB DDR4 |
| RAM per GPU (effective) | ~128 GB (shared, not partitioned) |
| Node interconnect | 200 Gb/s HDR InfiniBand |
| SLURM partition | `boost_usr_prod` |

**Per-GPU memory guide:**

| Model size | Training mode | Min GPUs recommended |
|------------|---------------|----------------------|
| 0.5B - 1B | low (QLoRA 4-bit) | 1 |
| 3B - 7B | low | 1 |
| 3B - 7B | high (bf16) | 2-4 |
| 13B | low | 2 |
| 13B | high | 4 |
| 30B+ | low or high | 4 |

**Billing:** $2.00 / GPU-hour, charged per completed GPU-minute (minimum 1 minute). Credits deducted automatically from your DataSpires balance.

**Training modes:**

| Mode | Strategy | Quantization | Typical GPUs |
|------|----------|-------------|--------------|
| `low` | QLoRA (rank 8) | 4-bit | 1 |
| `medium` | LoRA (rank 16) | 8-bit / none | 1-2 |
| `high` | LoRA (rank 64) + DDP/FSDP | none | 2-4+ |

### `FinetuneJob`

Returned by `client.finetune()`.

| Method / Property | Description |
|-------------------|-------------|
| `run(wait=True)` | Submit to SLURM. `wait=True` polls until done. |
| `cancel()` | Cancel the SLURM job |
| `get_logs(tail=100)` | Fetch recent log lines |
| `status` | Current status string |
| `job_id` | AfriLink job ID (8-char UUID prefix) |
| `slurm_job_id` | SLURM numeric job ID (set after `run()`) |

`run()` returns a dict:

```python
{
    "job_id": "a1b2c3d4",
    "slurm_job_id": "12345678",
    "status": "completed",        # or "submitted" if wait=False
    "output_dir": "/path/...",
    "model_path": "/path/...",
}
```

### Session Watchdog

The SDK monitors your SSH certificate in the background and prints a warning as it approaches expiry (at 60, 30, 15, and 5 minutes remaining). You can check time remaining at any point:

```python
print(f"{client.cert_minutes_remaining:.0f} minutes remaining on SSH certificate")
```

### Session Recovery

SSH certificates expire after ~12 hours. The SDK monitors this automatically and warns you before expiry. When you see the warning — or when you return to a notebook after being away — call `recover_session()` to re-authenticate and pick up where you left off:

```python
# Re-authenticate and check on all tracked jobs
recovery = client.recover_session("./recovered-models")

# recovery.re_authenticated  — True if fresh SSH cert was obtained
# recovery.jobs               — status of each tracked SLURM job
# recovery.files_retrieved    — list of model dirs downloaded for completed jobs
```

What `recover_session()` does:

1. **Re-authenticates with CINECA** — gets a fresh SSH certificate without re-entering credentials
2. **Checks all tracked SLURM jobs** — reports status of every job submitted in this session
3. **Downloads completed models** — if you pass a `download_dir`, finished adapters are pulled automatically
4. **Registers email notification** — for jobs still running, you'll get an email when they finish

Your SLURM jobs keep running on the cluster even after your certificate expires — you just need fresh credentials to check on them or download results.

```python
# Minimal usage (just re-auth, no download)
client.recover_session()

# With download directory for completed jobs
client.recover_session("./my-models")
```

---

### `client.inference()`

Route an inference request to any HuggingFace Inference Endpoint — no CINECA session required.

```python
# Public model (no token needed, rate-limited)
result = client.inference(
    "Explain LoRA fine-tuning in one sentence.",
    model_id="HuggingFaceH4/zephyr-7b-beta",
)
print(result.text)

# Gated model with HF token + generation parameters
result = client.inference(
    "What is transfer learning?",
    model_id="meta-llama/Llama-2-7b-chat-hf",
    hf_token="hf_...",
    parameters={"max_new_tokens": 256, "temperature": 0.7},
)

# Private HuggingFace Inference Endpoint
result = client.inference(
    payload={"inputs": "Hello!"},
    endpoint_url="https://xyz.endpoints.huggingface.cloud",
    hf_token="hf_...",
)

# Check result
if result.success:
    print(result.text)
else:
    print(f"Error {result.status_code}: {result.error}")
```

**`InferenceResult` fields:**

| Field | Type | Description |
|-------|------|-------------|
| `text` | str | Generated text (first result) |
| `raw` | Any | Full decoded JSON from HuggingFace |
| `status_code` | int | HTTP status code |
| `success` | bool | True if status_code < 400 |
| `error` | str \| None | Error message, or None |

---

### `client.download_model()`

```python
client.download_model(result["job_id"], "./my-model")
```

Downloads adapter files (`adapter_config.json`, `adapter_model.safetensors`, tokenizer files) flat into the target directory — ready for `PeftModel.from_pretrained()`.

### Working With Your Model

Once you have downloaded adapter weights with `client.download_model()`, the adapter directory is ready for standard Hugging Face tooling.

#### GGUF Conversion & Ollama

Convert your adapter to GGUF format for use with Ollama or llama.cpp:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Merge adapter into base model
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
model = PeftModel.from_pretrained(base, "./my-model")
merged = model.merge_and_unload()
merged.save_pretrained("./my-model-merged")
AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B").save_pretrained("./my-model-merged")

# 2. Convert to GGUF (requires llama.cpp — see llama.cpp repo for build instructions)
# python convert_hf_to_gguf.py ./my-model-merged --outfile my-model.gguf --outtype f16

# 3. Quantize (optional, 4-bit)
# ./llama-quantize my-model.gguf my-model-q4.gguf Q4_K_M

# 4. Run with Ollama
# Create a Modelfile:
#   FROM ./my-model-q4.gguf
# ollama create my-model -f Modelfile
# ollama run my-model
```

#### Publishing to Hugging Face Hub

```python
from huggingface_hub import HfApi

api = HfApi(token="hf_...")
repo_id = "your-username/my-finetuned-model"
api.create_repo(repo_id, exist_ok=True)

# Option A — adapter only (small, loads on top of base model)
api.upload_folder(folder_path="./my-model", repo_id=repo_id)

# Option B — full merged model
api.upload_folder(folder_path="./my-model-merged", repo_id=repo_id)

# Option C — GGUF file
api.upload_file(path_or_fileobj="./my-model-q4.gguf",
                path_in_repo="my-model-q4.gguf",
                repo_id=repo_id)
```

---

### Model & Dataset Registry

```python
# List all models
client.list_available_models()

# Filter by size
client.list_available_models(size="tiny")   # tiny | small | medium | large

# List datasets
client.list_available_datasets()

# Resource requirements
client.get_model_requirements("qwen2.5-0.5b", "low")
```

**Available models (v0.1.0):**

| ID | Name | Type | Params | Min VRAM |
|----|------|------|--------|----------|
| `qwen2.5-0.5b` | Qwen 2.5 0.5B | text | 0.5B | 4 GB |
| `gemma-3-270m` | Gemma 3 270M | text | 0.27B | 2 GB |
| `llama-3.2-1b` | Llama 3.2 1B | text | 1.0B | 4 GB |
| `deepseek-r1-1.5b` | DeepSeek R1 1.5B | text | 1.5B | 6 GB |
| `ministral-3b` | Ministral 3B | text | 3.3B | 8 GB |
| `florence-2-base` | Florence 2 Base | vision | 0.23B | 4 GB |
| `smolvlm-256m` | SmolVLM 256M | vision | 0.26B | 2 GB |
| `moondream2` | Moondream 2 | vision | 1.9B | 8 GB |
| `internvl2-1b` | InternVL2 1B | vision | 1.0B | 4 GB |
| `llava-1.5-7b` | LLaVA 1.5 7B | vision | 7.0B | 16 GB |

### Data Transfer

```python
# Upload a dataset
client.upload_dataset("./train.jsonl", dataset_name="my-data")

# Download model weights
client.download_model("a1b2c3d4", "./my-model")

# List remote files
client.transfer.list_remote_files("$WORK/finetune_outputs/")

# Run shell commands on HPC
client.run_command("squeue -u $USER")
```

### Dataset Formats

`client.finetune(data=...)` accepts:

| Type | How it's handled |
|------|-----------------|
| `pandas.DataFrame` | Serialised to JSONL, uploaded via SCP |
| `datasets.Dataset` | Saved to disk, uploaded via SCP |
| `str` (local path) | Uploaded via SCP |
| `str` (starts with `$`) | Treated as remote HPC path (no upload) |

Your DataFrame should have a `text` column with the full prompt+response formatted as a single string (Alpaca-style or chat template).

---

## Architecture

```
Notebook Interface                      High Performance Compute
+--------------+      SSH/SCP          +------------------+
| AfriLink SDK | ------------------->  |  Login Node      |
|              |  (Smallstep certs)    |  +- SLURM sbatch |
| DataSpires   |                       |  +- $WORK/       |
| (billing)    |                       |  |  +- containers|
|              |                       |  |  +- datasets  |
+--------------+                       |  |  +- finetune_ |
                                       |  |     outputs/  |
                                       |  |     +- {jobid}|
                                       |  +- Singularity  |
                                       |     container    |
                                       |     (A100 GPUs)  |
                                       +------------------+
```

---

## Publishing to PyPI

For maintainers:

```bash
cd afrilink-sdk
pip install build twine

# Build wheel + sdist
python -m build

# Upload to PyPI (requires PyPI API token)
twine upload dist/*
```

You'll need a PyPI account at https://pypi.org and an API token configured in `~/.pypirc` or passed via `--username __token__ --password pypi-...`.

---

## License

MIT
