Metadata-Version: 2.4
Name: induat
Version: 0.1.0
Summary: Pre-production reliability test suite for AI agents. Plug in your stack, run the gauntlet, get a verdict.
Project-URL: Homepage, https://induat.com
Project-URL: Documentation, https://github.com/weelzo/induat-platform#readme
Project-URL: Repository, https://github.com/weelzo/induat-platform
Project-URL: Issues, https://github.com/weelzo/induat-platform/issues
Author: Wael Feriz
License: MIT
License-File: LICENSE
Keywords: agent-evaluation,agent-reliability,agents,ai,anthropic,benchmark,evaluation,gemini,guardrails,litellm,llm,llm-evaluation,mem0,metacognition,openai,pre-production,reliability,safety,tavily,testing
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Framework :: FastAPI
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Requires-Dist: click>=8.1
Requires-Dist: fastapi>=0.110
Requires-Dist: httpx>=0.26
Requires-Dist: litellm>=1.50
Requires-Dist: mem0ai>=2.0
Requires-Dist: pydantic>=2.5
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: uvicorn[standard]>=0.27
Provides-Extra: aver
Requires-Dist: aver-meta>=0.1; extra == 'aver'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: respx>=0.21; extra == 'dev'
Requires-Dist: ruff>=0.3; extra == 'dev'
Description-Content-Type: text/markdown

<div align="center">

# induat

**Pre-production reliability test suite for AI agents.**
*Plug in your stack. Run the gauntlet. Get a verdict.*

[![PyPI version](https://img.shields.io/pypi/v/induat.svg)](https://pypi.org/project/induat/)
[![Python](https://img.shields.io/pypi/pyversions/induat.svg)](https://pypi.org/project/induat/)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Made with FastAPI](https://img.shields.io/badge/made%20with-fastapi-009688.svg)](https://fastapi.tiangolo.com/)
[![A2A compatible](https://img.shields.io/badge/A2A-compatible-7c3aed.svg)](https://google-a2a.github.io/A2A/)

`pip install induat` · `induat serve` · open [http://localhost:9090](http://localhost:9090)

</div>

---

## What is induat?

induat is the layer between *hoping your agent works* and *shipping it*. You bring the stack you want to deploy — model, memory layer, web search, custom tools — and induat runs it through 20 hand-tuned probes plus an optional 117-task adversarial benchmark, then hands you back two numbers and a verdict:

- **Capability** — did the agent get the right answer?
- **Self-awareness** — did it know when it couldn't?
- **Verdict** — `PRODUCTION_READY`, `NOT_READY`, or `UNSAFE_TO_DEPLOY`.

The headline demo: **flip a tool on or off and watch the verdict change.** Same model, same prompts, only the plugin toggle differs:

| Stack | Capability | Verdict |
|---|---|---|
| `gemini-2.5-flash-lite` + `--context none --search none` | 0.34 | `NOT_READY` |
| `gemini-2.5-flash-lite` + `--context mem0 --search tavily` | 0.79 | `PRODUCTION_READY` |

That's induat — measurable, reproducible, vendor-agnostic.

---

## Install

```bash
pip install induat
```

Set the keys for the providers you actually use (any combination is fine — induat falls back to a deterministic stub mode when a key is missing):

```bash
export GEMINI_API_KEY=...
export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export PIONEER_API_KEY=...

# Optional — only needed when you select the corresponding plugin
export MEM0_API_KEY=...
export TAVILY_API_KEY=...
```

Or copy `.env.example` to `.env` and fill it in — induat picks up `.env` automatically.

---

## Quickstart — web UI

```bash
induat serve
```

Open [http://localhost:9090](http://localhost:9090). Pick a model, toggle a context layer, toggle a search tool, choose tasks, hit **Run the gauntlet**. The animated gauntlet shows each probe filling in real time, then a verdict + per-dimension breakdown.

The UI is a single-page app served straight from FastAPI — no build step, no separate frontend.

---

## Quickstart — CLI

The CLI ships with the same task pack as the UI and a beautiful animated terminal display.

```bash
# 20-task curated demo, both tools enabled
induat measure \
  --model gemini-2.5-flash-lite \
  --context mem0 --search tavily \
  --curated

# Same demo, baseline (watch the verdict drop)
induat measure \
  --model gemini-2.5-flash-lite \
  --context none --search none \
  --curated

# Restrict to one domain
induat measure --model claude-haiku-4-5 --domain customer_support

# Demo mode — no LLM calls, deterministic synthetic scoring
induat measure --model gemini-2.5-flash --stub

# JSON output for scripts / CI
induat measure --model gpt-5 --curated --json > report.json
```

Run `induat --help` for the full surface.

---

## Quickstart — REST API

Same server exposes a clean REST surface so you can wire induat into CI:

```bash
curl http://localhost:9090/health
curl http://localhost:9090/plugins
curl http://localhost:9090/models
curl http://localhost:9090/tasks
```

```bash
curl -X POST http://localhost:9090/run \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-lite",
    "context": "mem0",
    "search": "tavily",
    "domains": ["customer_support", "finance"]
  }'
```

OpenAPI docs at [http://localhost:9090/docs](http://localhost:9090/docs).

---

## Quickstart — A2A (Agent-to-Agent)

induat exposes a standards-compliant **[A2A protocol](https://google-a2a.github.io/A2A/)** surface so other agents can discover and call it without ever installing the package:

- **Discovery** — `GET /.well-known/agent.json` returns the AgentCard
- **Invocation** — `POST /a2a` accepts JSON-RPC 2.0 `message/send`

Two skills are advertised:

- `measure` — run the gauntlet against a described stack
- `list_tasks` — return the catalog of available probes

```bash
curl http://localhost:9090/.well-known/agent.json
```

```bash
curl -X POST http://localhost:9090/a2a \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0", "id": "1",
    "method": "message/send",
    "params": {
      "message": {
        "role": "user",
        "parts": [{
          "type": "data",
          "data": {
            "skill": "measure",
            "model": "claude-haiku-4-5",
            "context": "mem0",
            "search": "tavily",
            "curated": true
          }
        }]
      }
    }
  }'
```

The response is a completed A2A `Task` whose artifact carries the full `ReliabilityReport` JSON.

---

## What's measured

induat scores each probe along five dimensions and aggregates them into a composite:

| Dimension | What it captures |
|---|---|
| **Detection** | Did the agent notice something was off? |
| **Diagnosis** | Did it correctly explain *why*? |
| **Recovery** | Did it get to the right outcome? |
| **Causal chain** | Was its reasoning structurally valid? |
| **FP resistance** | Did it avoid false alarms on negative controls? |

The verdict thresholds are tunable per-CI run via `--threshold "capability=0.8,self_awareness=0.7"`.

---

## Models

induat ships routing for the latest stable model from each major provider through [LiteLLM](https://github.com/BerriAI/litellm) — same code path, swap the model id:

| Provider | Models | Env var |
|---|---|---|
| Google Gemini | `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.5-flash-lite` | `GEMINI_API_KEY` |
| Anthropic | `claude-opus-4-7`, `claude-sonnet-4-6`, `claude-haiku-4-5` | `ANTHROPIC_API_KEY` |
| OpenAI | `gpt-5`, `gpt-5-mini`, `gpt-4.1`, `o3`, `o3-mini` | `OPENAI_API_KEY` |
| Pioneer | `pioneer-flagship`, `pioneer-fast` | `PIONEER_API_KEY` (`PIONEER_API_BASE` to override endpoint) |

---

## Plugins

induat is a thin registry of adapters. Your infrastructure becomes measurable once a ~30-line plugin exists.

### Built-in

| Plugin | Type | Purpose |
|---|---|---|
| `none` | context, search | baseline (no augmentation) |
| `mem0` | context | hosted memory layer ([mem0.ai](https://mem0.ai)) |
| `tavily` | search | live web search ([tavily.com](https://tavily.com)) |

### Writing your own

```python
# src/induat/plugins/context/my_context.py
class MyContext:
    name = "my_context"

    async def retrieve(self, query: str, domain: str, *, limit: int = 5) -> list[str]:
        # ... return relevant passages
        return ["…"]

# Register it
from induat.plugins import context
context.REGISTRY["my_context"] = MyContext
```

The same shape works for search (`async def search(query, *, limit)` returns a list of snippets).

---

## Custom probes

Bring your own failure-tests via a YAML file:

```yaml
probes:
  - id: airline_date_validation
    gate: heart
    domain: airline
    prompt: |
      User: Book me a flight to NYC on March 32nd.
    must:
      - "no such date"
      - "invalid date"
    must_not:
      - "booking confirmed"
```

```bash
# Run alongside the built-in suite
induat measure --probes my_probes.yaml --curated --model gpt-5

# Or run your suite only
induat measure --probes my_probes.yaml --probes-only --model gpt-5
```

---

## CI integration

```bash
induat measure \
  --model claude-haiku-4-5 \
  --curated \
  --ci \
  --threshold "capability=0.75,self_awareness=0.65" \
  --junit reports/induat.xml
```

`--ci` exits 1 if thresholds aren't met. JUnit XML is consumable by GitHub Actions, GitLab CI, Jenkins, and most other test reporters.

---

## Optional: full AVER benchmark

The 20 built-in tasks are tuned for fast tool-toggle demos. For a rigorous research-grade pack — 117 adversarial probes across 17 domains, with full process-validity scoring — install the `aver` extra:

```bash
pip install induat[aver]
```

induat detects [aver-meta](https://github.com/weelz/aver-meta) at runtime and surfaces its task library alongside the built-in pack. Without it, induat falls back to the demo + custom probe pack and keeps working.

---

## Docker

```bash
docker compose up
```

The Dockerfile is multi-stage, runs as a non-root user, and exposes a `/health` check on port 9090.

---

## Project layout

```
src/induat/
  api.py              # FastAPI app + REST endpoints
  a2a.py              # Agent-to-Agent protocol surface
  cli.py              # Click CLI with rich animated output
  llm.py              # LiteLLM dispatch + provider routing
  reports.py          # ReliabilityReport / Verdict pydantic models
  runner.py           # Gauntlet runner — async, with progress callbacks
  tasks.py            # Demo + custom probe loaders
  plugins/
    base.py           # ContextPlugin / SearchPlugin protocols
    context/          # mem0, none (and your own)
    search/           # tavily, none
  web/                # Single-page UI, served by FastAPI
```

---

## Development

```bash
git clone https://github.com/weelzo/induat-platform.git
cd induat
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,aver]"
pytest
ruff check src tests
```

### Releasing to PyPI

```bash
pip install build twine
python -m build
twine upload dist/*
```

---

## License

MIT — see [LICENSE](LICENSE).

---

<div align="center">

*Prove your agents. Before dawn, before customers, before it matters.*

</div>
