Metadata-Version: 2.4
Name: PLD_accounting
Version: 0.2.0
Summary: Numerical privacy accounting for random allocation and subsampling using PLDs.
Author: Moshe Shenfeld
Maintainer: Moshe Shenfeld
License-Expression: MIT
Project-URL: Homepage, https://github.com/moshenfeld/PLD_accounting
Project-URL: Repository, https://github.com/moshenfeld/PLD_accounting
Project-URL: Issues, https://github.com/moshenfeld/PLD_accounting/issues
Keywords: differential privacy,privacy accounting,privacy loss distribution,pld
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Requires-Dist: scipy>=1.10
Requires-Dist: numba>=0.58
Requires-Dist: dp-accounting>=0.4.3
Requires-Dist: typing-extensions>=4.0
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Requires-Dist: pytest-cov>=4.0; extra == "test"
Requires-Dist: mypy>=1.10; extra == "test"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: build>=1.2.0; extra == "dev"
Dynamic: license-file

# PLD_accounting

**Tight numerical privacy accounting for random allocation and subsampling using Privacy Loss Distributions (PLDs).**

This library provides end-to-end DP accounting for federated learning and other privacy-preserving systems using the random allocation subsampling scheme. It supports both Gaussian mechanisms and custom PLD realizations, with adaptive resolution refinement for accuracy/performance tradeoffs.

---

## Quick Start

Install from PyPI:
```bash
pip install PLD_accounting
```

Compute privacy guarantees with automatic resolution tuning:
```python
from PLD_accounting import gaussian_allocation_epsilon_range

epsilon_upper, epsilon_lower = gaussian_allocation_epsilon_range(
    sigma=3.0,          # Gaussian noise scale
    num_steps=100,      # Total training steps
    num_selected=10,    # Clients selected per step
    delta=1e-6,         # Target delta
)
print(f"ε ∈ [{epsilon_lower:.4f}, {epsilon_upper:.4f}]")
```

---

## What This Library Provides

- **Random allocation accounting**: Tight bounds for selecting `k` out of `n` clients per round
- **Adaptive resolution**: Automatic grid refinement to balance accuracy and runtime
- **Two input modes**:
  - **Gaussian**: Specify `sigma`, `num_steps`, `num_selected` → get ε or δ
  - **Realization**: Provide explicit PLD → compose under random allocation
- **Direction-aware bounds**: Upper/lower bounds for REMOVE, ADD, or BOTH directions
- **Subsampling amplification**: Direct PLD-based amplification for PREAMBLE-style workflows (DOMINATES only)
- **Efficient convolution**: FFT for linear grids, geometric for multiplicative grids

---

## When to Use Each Path

### Gaussian Path (Most Common)
**Use when:** Your mechanism adds Gaussian noise with known `σ`

**Example:**
```python
from PLD_accounting import (
    gaussian_allocation_epsilon_extended,
    PrivacyParams,
    AllocationSchemeConfig,
)

params = PrivacyParams(sigma=3.0, num_steps=100, num_selected=10, delta=1e-6)
config = AllocationSchemeConfig(loss_discretization=0.02, tail_truncation=1e-8)

epsilon = gaussian_allocation_epsilon_extended(params, config)
```

**Adaptive variant** (recommended for exploratory analysis):
```python
epsilon_upper, epsilon_lower = gaussian_allocation_epsilon_range(
    sigma=3.0, num_steps=100, num_selected=10, delta=1e-6
)
```

### Realization Path (Advanced)
**Use when:** You have an explicit PLD realization or a non-Gaussian mechanism

**Example:**
```python
import numpy as np
from PLD_accounting import general_allocation_PLD, PLDRealization, AllocationSchemeConfig

# Define your mechanism's privacy loss distribution on a linear grid
remove_realization = PLDRealization(
    x_min=0.0,
    x_gap=0.1,
    PMF_array=np.array([0.3, 0.25, 0.2, 0.15, 0.1]),
)
add_realization = PLDRealization(
    x_min=0.0,
    x_gap=0.1,
    PMF_array=np.array([0.3, 0.25, 0.2, 0.15, 0.1]),
)

pld = general_allocation_PLD(
    num_steps=10,
    num_selected=5,
    num_epochs=1,
    config=AllocationSchemeConfig(),
    remove_realization=remove_realization,
    add_realization=add_realization,
)

epsilon = pld.get_epsilon_for_delta(1e-6)
```

**Requirements for realizations:**
- Linear grid structure (uniform spacing)
- Valid PLD per Definition 3.1: `E[exp(-L)] ≤ 1`, no mass at `L = -∞`
- Total probability mass = 1

---

## Key Concepts

### Random Allocation
Selecting `k` clients from `n` candidates provides privacy amplification compared to full-batch training. This library accounts for the composition of:
- `num_steps` total allocation steps
- `num_selected` clients chosen per step
- `num_epochs` passes through the data

For both Gaussian and realization paths, composition semantics are:
- inner per-round composition count: `floor(num_steps / num_selected)`
- outer composition count: `num_selected * num_epochs`
- therefore `num_steps` must satisfy `num_steps >= num_selected`

### Directions
- **REMOVE**: Privacy loss when removing a data record
- **ADD**: Privacy loss when adding a data record
- **BOTH**: Analyze both directions (most conservative)

### Bound Types
- **DOMINATES**: Upper bound (pessimistic, safe for privacy proofs)
- **IS_DOMINATED**: Lower bound (optimistic, for tightness evaluation)

### PLD Dual
For a PLD realization `L`, the dual `D(L)` is the PLD in the reversed privacy direction (`L_{Q,P}`).
It reflects the support (`l -> -l`), reweights finite mass by `exp(-l)`, and places the residual mass at `+∞`.
In remove-direction internals, we often use `-D(L)`, obtained explicitly by negating `D(L)`.

### Adaptive Resolution
The `*_range()` functions iteratively refine discretization to achieve target accuracy:
- Start from Poisson-subsampled estimate
- Refine grid spacing and truncation
- Track best upper/lower bounds
- Stop when gap meets target (or after 10 iterations)

---

## Common Workflows

### 1. Simple ε Query
```python
from PLD_accounting import gaussian_allocation_epsilon_range

eps_upper, eps_lower = gaussian_allocation_epsilon_range(
    sigma=5.0, num_steps=500, num_selected=8, delta=1e-5
)
```

### 2. PLD for Multiple Queries
```python
from PLD_accounting import gaussian_allocation_PLD, PrivacyParams, AllocationSchemeConfig

pld = gaussian_allocation_PLD(
    params=PrivacyParams(sigma=3.0, num_steps=100, num_selected=10),
    config=AllocationSchemeConfig(),
)

# Query multiple (ε, δ) pairs efficiently
for delta in [1e-4, 1e-5, 1e-6]:
    eps = pld.get_epsilon_for_delta(delta)
    print(f"δ={delta:.0e} → ε={eps:.4f}")
```

### 3. Subsampling + Composition
```python
from PLD_accounting import gaussian_allocation_PLD, subsample_PLD

# One training round
base_pld = gaussian_allocation_PLD(...)

# Apply subsampling amplification
subsampled = subsample_PLD(base_pld, sampling_probability=0.1)

# Compose across rounds
final_pld = subsampled.self_compose(num_rounds=20)
epsilon = final_pld.get_epsilon_for_delta(1e-6)
```

`subsample_PLD()` / `subsample_PMF()` are DOMINATES-only utilities; they do not accept a bound-type argument.

See [`usage_example.py`](usage_example.py) for complete runnable examples including PREAMBLE-style workflows.

---

## Configuration Parameters

### PrivacyParams
- `sigma`: Gaussian noise scale (higher = more privacy)
- `num_steps`: Total allocation steps
- `num_selected`: Clients per step (`k` in the paper)
- `num_epochs`: Training epochs (default 1)
- `delta` or `epsilon`: Query target

### AllocationSchemeConfig
- `loss_discretization`: Grid spacing (smaller = tighter, slower). Default: 1e-3
- `tail_truncation`: Truncate probability mass below this. Default: 1e-10
- `max_grid_FFT`: FFT grid size limit. Default: 2,000,000
- `max_grid_mult`: Geometric grid size limit. Default: 0 (unlimited)
- `convolution_method`: FFT, GEOM, BEST_OF_TWO, or COMBINED

**Tradeoff:** Smaller `loss_discretization` and `tail_truncation` → tighter bounds but higher memory/runtime.

---

## Requirements

- Python ≥ 3.10
- numpy ≥ 1.23
- scipy ≥ 1.10
- numba ≥ 0.58
- dp-accounting ≥ 0.4.3

All dependencies install automatically with the package.

---

## Development Setup

From source:
```bash
git clone https://github.com/moshenfeld/PLD_accounting.git
cd PLD_accounting
pip install -e ".[dev]"
```

Run tests:
```bash
pytest -q
```

With coverage:
```bash
./tests/run_tests.sh --coverage
```

Build distribution:
```bash
python -m build
```

---

## API Reference

### Gaussian Path
- `gaussian_allocation_epsilon_range()` - adaptive upper/lower ε bounds
- `gaussian_allocation_delta_range()` - adaptive upper/lower δ bounds
- `gaussian_allocation_epsilon_extended()` - single ε value with fixed config
- `gaussian_allocation_delta_extended()` - single δ value with fixed config
- `gaussian_allocation_PLD()` - build PLD object for repeated queries

### Realization Path
- `general_allocation_PLD()` - build PLD from explicit realizations
- `general_allocation_epsilon()` - compute ε from realizations
- `general_allocation_delta()` - compute δ from realizations
- `PLDRealization` - linear-grid PLD realization type

### Composition
- `subsample_PLD()` - apply subsampling amplification to a PLD

---

## Project Structure

```
PLD_accounting/
├── discrete_dist.py              # Distribution types (Linear, Geometric, PLDRealization)
├── random_allocation_api.py      # Public API surface
├── random_allocation_accounting.py  # Shared composition logic
├── random_allocation_gaussian.py    # Gaussian-specific implementation
├── adaptive_random_allocation.py    # Adaptive resolution refinement
├── geometric_convolution.py      # Multiplicative-grid convolution
├── FFT_convolution.py           # Linear-grid convolution
├── subsample_PLD.py             # Subsampling amplification
└── ...

tests/
├── unit/                        # Type checks, validations, edge cases
└── integration/                 # End-to-end workflows, comparisons

usage_example.py                 # Runnable examples
```

See [IMPLEMENTATION_OVERVIEW.md](IMPLEMENTATION_OVERVIEW.md) for architectural details.

---

## Citation

If you use this library in your research, please cite:
```
[Paper citation pending]
```

---

## License

[License information]

---

## Support

For questions or issues:
- GitHub Issues: https://github.com/moshenfeld/PLD_accounting/issues
- Documentation: See `usage_example.py` and docstrings
