Metadata-Version: 2.4
Name: autoresearch-core
Version: 0.4.4
Summary: Decision-contracts library: deterministic verdict/gate/failure/promotion logic for autoresearch loops.
Project-URL: Homepage, https://github.com/ca1773130n/autoresearch-core
Project-URL: Repository, https://github.com/ca1773130n/autoresearch-core
Project-URL: Issues, https://github.com/ca1773130n/autoresearch-core/issues
Project-URL: Changelog, https://github.com/ca1773130n/autoresearch-core/blob/main/CHANGELOG.md
Author-email: Cameleon X <ca1773130n@gmail.com>
License: MIT
License-File: LICENSE
Keywords: agent,autoresearch,decision-contracts,deterministic,research-loop,verdict
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Python: >=3.11
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# autoresearch-core

[![CI](https://github.com/ca1773130n/autoresearch-core/actions/workflows/ci.yml/badge.svg)](https://github.com/ca1773130n/autoresearch-core/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/autoresearch-core.svg)](https://pypi.org/project/autoresearch-core/)
[![Python](https://img.shields.io/pypi/pyversions/autoresearch-core.svg)](https://pypi.org/project/autoresearch-core/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

A tiny, **pure-Python decision-contracts library** for autoresearch / agentic
loops: a *deterministic* verdict (metric / comparator / target), failure
classification, gates, promotion record shapes, and — since 0.4.3 —
**life-harness round contracts** (evidence-driven self-improvement policy) —
the disciplined decision core, with **zero runtime dependencies** and
**no I/O**.

You bring the loop, the retrieval, the runner, and the storage; you bind them to
the library's `Protocol`s and call `measure` / `decide` / `should_promote_dead_end`
at your decision points. The verdict logic is parity-tested against the
[GRD](https://github.com/ca1773130n/GetResearchDone) autoresearch loop.

## Why

Agentic research loops fail in a predictable way: the model grades its own
homework. An LLM proposes a hypothesis, runs an experiment, then *judges*
whether the result supports the hypothesis — and judgment drifts.
`autoresearch-core` removes the judge from the control path:

1. Every hypothesis must carry a **machine-readable contract**
   (`MetricSpec`: *which metric, which comparator, which target*).
2. Experiments report results through a **machine-readable line**
   (`__RESULT__ {"accuracy": 0.93}` on stdout).
3. The verdict is **computed, not judged**: metric vs target →
   `supported` / `refuted` / `inconclusive`.
4. Only a **deterministic refutation** may auto-promote a dead-end. Anything
   judged by an LLM or inferred from an exit code is advisory.

## Install

```bash
pip install autoresearch-core
```

Requires Python 3.11+. No runtime dependencies. Fully typed (`py.typed`).

## Quickstart

```python
from autoresearch_core import (
    MetricSpec, ExperimentResult, measure, parse_metrics_line, should_promote_dead_end,
)

spec = MetricSpec(metric_key="recall_at_10", comparator=">=", target=0.8)

# An experiment prints `__RESULT__ {"recall_at_10": 0.83}` on stdout:
metrics = parse_metrics_line(stdout)        # -> {"recall_at_10": 0.83}
verdict = measure(spec, ExperimentResult(metrics=metrics, exit_code=0))

verdict.verdict          # "supported" | "refuted" | "inconclusive"  (deterministic)
verdict.evidence_level   # "deterministic"
should_promote_dead_end(verdict)            # True only for a deterministic refutation
```

## Documentation

- **[QUICKSTART](https://github.com/ca1773130n/autoresearch-core/blob/main/QUICKSTART.md)** —
  zero to a working deterministic verdict in five minutes, with a complete
  runnable script.
- **[TUTORIAL](https://github.com/ca1773130n/autoresearch-core/blob/main/TUTORIAL.md)** —
  build a full hypothesis → experiment → measure → learn loop on top of the
  library: contracts, failure classes, gates, dead-end promotion, infrastructure
  ports, and custom verdict strategies.
- **[RELEASING](https://github.com/ca1773130n/autoresearch-core/blob/main/RELEASING.md)** —
  ownership, versioning policy, release procedure, and the shared
  parity-vector contract with GRD.
- **[CHANGELOG](https://github.com/ca1773130n/autoresearch-core/blob/main/CHANGELOG.md)**

## What it owns (and what it doesn't)

**Owns — the decision discipline:**

| Module | Public surface | Job |
|---|---|---|
| `types` | `MetricSpec`, `ExperimentResult`, `VerdictRecord`, `Hypothesis`, `Takeaway`, `GateState`, `GateCheck` | Frozen dataclasses; pure data, no logic |
| `contract` | `parse_metrics_line`, `validate_metric_spec` | The `__RESULT__ {json}` experiment-result contract |
| `verdict` | `compare`, `DeterministicVerdict`, `VerdictStrategy` | Metric vs target → supported / refuted / inconclusive |
| `failures` | `classify_run_failure` | stderr → `H2` (missing dep) / `H3` (missing file / permission) / `H4` (timeout / runtime) / `none` |
| `gates` | `resolve_gates`, `check_gate` | Approval gates (`execute`, `kg_write`) resolved from config |
| `policy` | `measure`, `decide`, `decide_branch`, `should_terminate`, `detect_plateau`, `should_promote_dead_end` | The loop's branch / terminate / promote decisions |
| `promote` | `DeadEndRecord`, `KnowhowRecord`, `approach_hash`, `build_dead_end_record`, `should_skip` | Promotion record shapes + approach dedupe |
| `rounds` | `resolve_autonomy`, `select_evidence`, `validate_round_patch`, `patch_hash`, `should_apply`, `decide_round` (+ `Finding`, `PatchEntry`, `RoundPatch`, `EvalReport`, `AutonomyState`, `RoundRecord` in `types`) | Life-harness rounds: the policy for patching a harness's own primitives from session evidence |

**Doesn't own — bind these via `ports.py` `Protocol`s to your own infra:**
`Spawn` (LLM call), `Retriever`, `KnowledgeGraph`, `ExperimentRunner`, `Store`,
and for rounds: `FindingsSource`, `PatchProposer`, `RoundEvaluator`, `Applier`,
`RoundStore`. No implementations ship in this package; the
[tutorial](https://github.com/ca1773130n/autoresearch-core/blob/main/TUTORIAL.md)
shows minimal bindings.

## Verdict authority

`DeterministicVerdict` is the default and the reason this package exists. Other
strategies (an LLM judge, an exit-code check) can be plugged in via the
`VerdictStrategy` protocol, but **only a deterministic refutation auto-promotes a
dead-end** — non-deterministic verdicts are advisory. Every verdict records its
`strategy` and `evidence_level`, so the decision trail stays auditable.

## Life-harness rounds (0.4.3)

The same discipline, pointed at the harness itself: a **round** turns session
evidence (e.g. Tesserae-compiled takeaways) into one eval-gated, reversible
patch to the host's own primitives. The kernel owns the *decisions* — evidence
selection, patch validation (path guards + a self-protection deny-list: a round
can never patch its own driver or autonomy config), dedupe, and the apply gate
(kill switch > eval > review-mode > confidence). Hosts bind the I/O through
five protocols and keep the forge in git: one commit per round, revert =
`git revert`.

```python
from autoresearch_core import (
    AutonomyState, EvalCheck, EvalReport, PatchEntry, RoundPatch, decide_round,
)

patch = RoundPatch(round_id="r1", entries=(
    PatchEntry(path="commands/execute-phase.md", kind="markdown", op="modify",
               content="...", rationale="executor keeps forgetting to commit"),
), summary="commit reminder in executor prompt", confidence=0.9)

decide_round(patch, AutonomyState(), set(), EvalReport(checks=(EvalCheck("lint", 0),)))
# ('evaluated', 'awaiting review (harness_review)')   — review mode is the default
```

Reference host: GRD's `gd harness round`. Design:
[life-harness rounds spec](https://github.com/ca1773130n/autoresearch-core/blob/main/docs/superpowers/specs/2026-06-06-life-harness-rounds-design.md).
Note: the package version is locked to GRD's version line from 0.4.3 onward
(see [RELEASING](https://github.com/ca1773130n/autoresearch-core/blob/main/RELEASING.md)).

## Development

```bash
pip install -e ".[dev]"
pytest -q --cov=autoresearch_core
```

The test suite includes a parity suite (`tests/test_parity.py`) that pins
behaviour to the GRD TypeScript implementation.

## License

MIT © Cameleon X — see [LICENSE](LICENSE).
