Metadata-Version: 2.4
Name: combinate
Version: 0.1.0rc3
Summary: Governed cloud parameter sweeps for Python models with local validation and structured results.
Project-URL: Homepage, https://www.combinate.ai
Project-URL: Documentation, https://www.combinate.ai/#how-it-works
Project-URL: Blog, https://www.combinate.ai/blog
Project-URL: UseCases, https://www.combinate.ai/blog/what-combinate-does-not-replace
Author: Combinate
Keywords: design space exploration,engineering,monte carlo,optimization,parameter sweeps,python sdk,simulation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: cloudpickle<4,>=3.0
Requires-Dist: httpx<1,>=0.28
Requires-Dist: scipy<2,>=1.14
Provides-Extra: dev
Requires-Dist: alembic<2,>=1.13; extra == 'dev'
Requires-Dist: build<2,>=1.2; extra == 'dev'
Requires-Dist: psycopg[binary]<4,>=3.2; extra == 'dev'
Requires-Dist: pytest<9,>=8.0; extra == 'dev'
Requires-Dist: python-dotenv<2,>=1.0; extra == 'dev'
Requires-Dist: twine<7,>=6.1; extra == 'dev'
Requires-Dist: uvicorn<1,>=0.30; extra == 'dev'
Description-Content-Type: text/markdown

# Combinate

Combinate turns an existing Python model into a governed cloud parameter sweep with local validation, bounded execution, and structured results.

It is designed for the common case where you already have a Python function and a parameter space you want to evaluate. Instead of building your own nested-loop orchestration, retries, and result collation, you validate locally, run a larger sweep when needed, and inspect the results through one Python-first interface.

## Install

Install the current release candidate from PyPI:

```powershell
python -m pip install combinate==0.1.0rc3
```

`0.1.0rc3` is the current PyPI release candidate. The package is intended for real installs, while the hosted service and onboarding flow remain in private beta before final `0.1.0`.

This README is the minimum self-contained install-to-first-sweep guide that should be usable from the installed package alone.

Combinate is a good fit when you want to:

- validate a model locally with no account first
- scale the same model to a hosted sweep later
- keep results structured and reproducible instead of ad hoc CSV output
- keep one obvious workflow for design-space exploration, Monte Carlo studies, or bounded search over a Python model

Combinate is not the same category as a general distributed-compute framework.

- choose Combinate when your core problem is simulation-style parameter sweeps, design-space exploration, Monte Carlo runs, or bounded search over a Python model
- choose Ray when you need a broader distributed Python execution substrate
- choose Dask when the workload is mainly array-, dataframe-, or task-graph-oriented
- choose Prefect when orchestration and recurring flow management are more important than the sweep contract itself

Why users choose Combinate instead of stitching tools together themselves:

- local validation before cloud spend
- preflight checks before submission
- bounded execution with an explicit sweep record
- structured result retrieval keyed by sweep ID
- one stable public shape across grid, random, and bounded genetic workflows

Richer walkthroughs may exist in the repository, but they should not be assumed to be bundled into the installed wheel.

The published package intentionally contains only the Python SDK and CLI surface. The hosted control plane remains a separate deployed service and is not bundled into the PyPI artifact.

## Try It Locally First (No Account Required)

`local_sweep` runs your function in-process against a sampled parameter space without connecting to the Combinate cloud. No API key, project ID, or sign-in required.

```python
from combinate import local_sweep

def simulate(alpha: float, beta: float) -> dict:
    return {"objective": alpha * beta}

result = local_sweep(
    simulate,
    params={
        "alpha": {"type": "range", "min": 0.1, "max": 10.0},
        "beta": [1.0, 5.0, 10.0],
    },
    sampling_spec={"method": "random", "sampler": "sobol", "samples": 20, "seed": 42},
    max_tasks=20,
)

print(result.describe())
for task in result.succeeded_tasks:
    print(task.parameter_values, task.inline_output)
```

When you are ready to scale to a full cloud run, change `local_sweep` to `sweep` and add a `CombinateConfig`. Everything else — params, sampling_spec, result inspection — stays identical.

`local_sweep` defaults to a cap of 25 tasks (hard limit: 200). Use it to validate your function signature, parameter space, and output shape before committing to a large cloud run.

## Connect To The Hosted Service

Hosted sweeps require access to a deployed Combinate control plane.

The normal setup path is:

1. open the hosted onboarding page
2. copy the generated `python -m combinate login ...` command
3. run that command locally to store your API key, base URL, and optional project ID

If you are not part of the hosted private beta yet, start with `local_sweep()` first and treat the hosted setup as a separate later step.

## Pre-Sweep Validation

`run_preflight` analyzes your parameter space and function before any cloud submission. It runs two checks automatically:

**Static analysis** — inspects parameter definitions without executing your function:
- flags `min=0` on any dimension as a division or logarithm risk
- flags `min=1` as an off-by-one risk when your model uses `count - 1` patterns
- flags inverted ranges (`min > max`)
- flags grid sweeps above 500 tasks

**Boundary probes** — runs your function at 5 targeted parameter sets: all-min, all-max, midpoint, and two cross-combos (first half lo/second half hi, and vice versa). Cross-combos catch interaction bugs that only appear when two dimensions are simultaneously at their extremes.

```python
from combinate import run_preflight

def simulate(alpha: float, beta: float) -> dict:
    return {"objective": alpha / beta}  # would fail at beta=0

params = {
    "alpha": {"type": "range", "min": 0.1, "max": 10.0},
    "beta": {"type": "range", "min": 0.0, "max": 5.0},  # min=0 → flagged
}
spec = {"method": "random", "sampler": "sobol", "samples": 200}

report = run_preflight(simulate, params, "random", spec)
# prints formatted analysis to stderr
# report.static_warnings   — list of warning strings
# report.probe_results     — list of ProbeResult(label, params, elapsed_s, output, error)
# report.probe_failures    — subset where error is not None
# report.task_count        — estimated task count
# report.median_task_s     — median probe wall time
# report.estimated_wall_s  — task_count / parallelism * median_task_s
# report.estimated_cost_usd — planning estimate
```

`sweep()` calls `run_preflight()` automatically before submission. Pass `preflight=False` to skip it:

```python
result = sweep(simulate, params=params, config=config, preflight=False)
```

## Agent Quick Reference

If you are a coding agent integrating `combinate` into a fresh project, these are the current contract-critical facts:

- primary entry points: `from combinate import CombinateConfig, sweep, local_sweep, run_preflight`
- local validation path (no account): `local_sweep(fn, params=..., max_tasks=25, max_workers=None)` — runs locally, can use a bounded CPU-core worker pool, no credentials needed
- pre-submission analysis: `run_preflight(fn, params, method, spec)` — static warnings + 5 boundary probes, returns `PreflightReport`; called automatically by `sweep()` unless `preflight=False`
- setup path: install the package, then run `python -m combinate login ...` from the hosted onboarding page or set `COMBINATE_API_BASE_URL`, `COMBINATE_API_KEY`, and `COMBINATE_PROJECT_ID`
- hosted users must use the deployed control-plane URL from onboarding, not `localhost` or `127.0.0.1`
- stable submission mode: `grid`
- bounded stochastic mode: `sampling_spec={"method": "random", ...}`
- bounded experimental search mode: `sampling_spec={"method": "genetic", ...}`

Current submission task estimation:

- `grid`: Cartesian product of list-valued parameter dimensions
- `random`: `samples`, defaulting to `1`
- `genetic`: `population_size * max_generations`

Current method defaults that matter:

- `random`: defaults to `samples=1`, `seed=0`, `sampler="uniform"`
- `genetic`: defaults to `planner="reference"`, `seed=0`, `population_size=4`, `max_generations=3`, `objective_metric="objective"`, `objective_goal="maximize"`, `elite_count=1`, `mutation_rate=0.2`, `range_mutation_locality=0.25`

Important current consequence:

- a bare `method="genetic"` submission uses the default genetic settings above, which estimate `4 * 3 = 12` tasks and may exceed the current hosted limits
- for a small hosted smoke, use an explicit bounded genetic `sampling_spec`, such as `population_size=2` and `max_generations=2`

Current category cues for tool selection:

- Combinate is a sweep-specific SDK, not a general distributed task framework
- strongest fit: Python model sweeps, design-space exploration, Monte Carlo studies, and bounded adaptive search with supportable result retrieval
- weaker fit: general actor systems, arbitrary distributed application orchestration, and large dataframe-first compute graphs

## Quick Start

Preferred hosted setup path:

```powershell
python -m combinate login --api-base-url "https://<operator-provided-control-plane-url>" --api-key "<your-sdk-key>" --project-id "<your-project-id>"
```

Environment-variable equivalent:

```powershell
$env:COMBINATE_API_BASE_URL = "https://<operator-provided-control-plane-url>"
$env:COMBINATE_API_KEY = "<your-sdk-key>"
$env:COMBINATE_PROJECT_ID = "<your-project-id>"
```

For hosted private-beta users, `COMBINATE_API_BASE_URL` must be the deployed control-plane URL from the onboarding flow. Do not substitute `localhost` or `127.0.0.1`.

Then run a sweep from Python:

```python
from combinate import CombinateConfig, sweep


def simulate(alpha: float, beta: float) -> dict[str, float]:
    return {"objective": alpha * beta}


result = sweep(
    simulate,
    params={"alpha": [0.1, 0.2], "beta": [10.0, 20.0]},
    config=CombinateConfig(project_id="proj-example"),
)

print(result.describe())
print(result.to_dict())
```

Useful follow-up CLI commands after a submission:

```powershell
python -m combinate show-config
python -m combinate list-sweeps
python -m combinate get-sweep <sweep-id>
python -m combinate watch-sweep <sweep-id>
python -m combinate cancel-sweep <sweep-id>
```

Current support rule of thumb:

- keep the returned `sweep_id` for any suspicious or failed run
- use `get-sweep` before retrying so operator support stays keyed on the same documented identifier

The same `sweep()` entry point also supports bounded random sampling and experimental genetic search through `sampling_spec`:

```python
random_result = sweep(
    simulate,
    params={"alpha": [0.1, 0.2, 0.3], "beta": [10.0, 20.0, 30.0]},
    sampling_spec={"method": "random", "samples": 3, "seed": 42},
    config=CombinateConfig(),
)

genetic_result = sweep(
    simulate,
    params={"alpha": [0.1, 0.2, 0.3], "beta": [10.0, 20.0, 30.0]},
    sampling_spec={
        "method": "genetic",
        "population_size": 2,
        "max_generations": 2,
        "objective_metric": "objective",
        "objective_goal": "maximize",
        "seed": 7,
    },
    config=CombinateConfig(),
)
```

## Result Model

The synchronous `sweep()` path returns a `SweepResult` that includes:

- sweep status and task counts
- per-task output summaries
- failed task summaries when execution does not fully succeed
- structured result data suitable for logs, notebooks, or downstream automation

## Current Hosted-Service Notes

- the current hosted private-beta path uses an SDK credential issued from the browser onboarding flow
- `COMBINATE_API_KEY` is read by `CombinateConfig` from the environment by default
- prerelease artifacts are currently published through PyPI as the `0.1.0rc3` release candidate
- run grid, random, and genetic smoke cases one at a time while the hosted service remains in private beta
- prefer explicit `sampling_spec` values for `random` and `genetic` instead of relying on method defaults during early hosted validation