Metadata-Version: 2.4
Name: sir-agent
Version: 0.0.2
Summary: Single-Input-Reasoning: one LLM call, full action graph execution with evolutionary memory
Project-URL: Homepage, https://tictacguy.github.io/SIR/
Project-URL: Repository, https://github.com/tictacguy/SIR-Agent
Project-URL: Issues, https://github.com/tictacguy/SIR-Agent/issues
Author-email: "Tommaso G. Bredariol" <tommasobredariol@gmail.com>
License-Expression: AGPL-3.0-or-later
License-File: LICENSE
Keywords: action-graph,agent,dag,evolutionary-memory,llm,reasoning,single-shot
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: click>=8.0
Requires-Dist: msgpack>=1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Provides-Extra: all
Requires-Dist: anthropic>=0.40; extra == 'all'
Requires-Dist: boto3>=1.35; extra == 'all'
Requires-Dist: ollama>=0.4; extra == 'all'
Requires-Dist: openai>=1.0; extra == 'all'
Provides-Extra: bedrock
Requires-Dist: boto3>=1.35; extra == 'bedrock'
Provides-Extra: claude
Requires-Dist: anthropic>=0.40; extra == 'claude'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: gemini
Requires-Dist: openai>=1.0; extra == 'gemini'
Provides-Extra: mistral
Requires-Dist: openai>=1.0; extra == 'mistral'
Provides-Extra: ollama
Requires-Dist: ollama>=0.4; extra == 'ollama'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: openrouter
Requires-Dist: openai>=1.0; extra == 'openrouter'
Provides-Extra: perplexity
Requires-Dist: openai>=1.0; extra == 'perplexity'
Provides-Extra: ui
Requires-Dist: fastapi>=0.100; extra == 'ui'
Requires-Dist: uvicorn>=0.24; extra == 'ui'
Requires-Dist: websockets>=12.0; extra == 'ui'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/tictacguy/SIR-Agent/main/docs/static/logo.png" alt="SIR Logo" width="200">
</p>

<h1 align="center">SIR — Single-Input-Reasoning</h1>

<p align="center">
  <strong>One LLM call. Full action graph. Evolutionary memory.</strong>
</p>

<p align="center">
  <a href="https://pypi.org/project/sir-agent/"><img src="https://img.shields.io/pypi/v/sir-agent" alt="PyPI"></a>
  <a href="https://pypi.org/project/sir-agent/"><img src="https://img.shields.io/pypi/pyversions/sir-agent" alt="Python"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-AGPL--3.0-blue.svg" alt="License"></a>
</p>

---

SIR is a Python SDK that delegates complex multi-step tasks to an LLM with a **single inference call**. The LLM produces an entire **Directed Acyclic Graph (DAG)** of actions in one shot. SIR then executes it locally with parallelism, fan-out, retry, conditional branching, speculative execution, and DAG branching.

Visit the [SIR website](https://tictacguy.github.io/SIR/) for a clear and easy introduction.

## Table of Contents

- [What Makes SIR Different](#what-makes-sir-different)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [How It Works](#how-it-works)
- [Benchmarks](#benchmarks)
- [Architecture](#architecture)
- [Tool Modes](#tool-modes)
- [Evolutionary Memory](#evolutionary-memory)
- [Advanced Features](#advanced-features)
- [Providers](#providers)
- [Configuration](#configuration)
- [CLI](#cli)
- [Web Dashboard](#web-dashboard)
- [DAG Visualization](#dag-visualization)
- [Roadmap](#roadmap)
- [License](#license)

## What Makes SIR Different

<table width="100%">
<thead>
<tr>
<th align="left" width="25%">Feature</th>
<th align="center" width="15%">ReAct</th>
<th align="center" width="15%">Plan & Execute</th>
<th align="center" width="15%">Chain-of-Tools</th>
<th align="center" width="15%"><strong>SIR</strong></th>
</tr>
</thead>
<tbody>
<tr><td>LLM calls per task</td><td align="center">N (one per step)</td><td align="center">1 + N</td><td align="center">1</td><td align="center"><strong>1 or 1+1</strong></td></tr>
<tr><td>Parallel execution</td><td align="center">No</td><td align="center">No</td><td align="center">No</td><td align="center"><strong>Full DAG</strong></td></tr>
<tr><td>Adaptive tool selection</td><td align="center">Yes (slow)</td><td align="center">Yes (slow)</td><td align="center">No, hardcoded</td><td align="center"><strong>Yes (1 call)</strong></td></tr>
<tr><td>Conditional branching</td><td align="center">Via LLM re-call</td><td align="center">Via LLM re-call</td><td align="center">No</td><td align="center"><strong>Local eval</strong></td></tr>
<tr><td>Fan-out (map-reduce)</td><td align="center">Manual</td><td align="center">Manual</td><td align="center">No</td><td align="center"><strong>Built-in</strong></td></tr>
<tr><td>Speculative execution</td><td align="center">No</td><td align="center">No</td><td align="center">No</td><td align="center"><strong>Yes</strong></td></tr>
<tr><td>DAG branching (multi-path)</td><td align="center">No</td><td align="center">No</td><td align="center">No</td><td align="center"><strong>Yes</strong></td></tr>
<tr><td>Post-execution reasoning</td><td align="center">No</td><td align="center">No</td><td align="center">No</td><td align="center"><strong>Yes (same session)</strong></td></tr>
<tr><td>Post-LLM graph optimization</td><td align="center">No</td><td align="center">No</td><td align="center">No</td><td align="center"><strong>Yes</strong></td></tr>
<tr><td>Evolutionary memory</td><td align="center">No</td><td align="center">No</td><td align="center">No</td><td align="center"><strong>dags.bin</strong></td></tr>
<tr><td>Token efficiency</td><td align="center">Low</td><td align="center">Low</td><td align="center">Medium</td><td align="center"><strong>Compressed</strong></td></tr>
<tr><td>Cost</td><td align="center">High (N calls)</td><td align="center">High (1+N)</td><td align="center">Medium (1)</td><td align="center"><strong>Minimal (1)</strong></td></tr>
</tbody>
</table>

## Installation

```bash
pip install sir-agent              # core only
pip install sir-agent[ollama]      # + Ollama support
pip install sir-agent[openai]      # + OpenAI support
pip install sir-agent[claude]      # + Anthropic Claude support
pip install sir-agent[gemini]      # + Google Gemini support
pip install sir-agent[bedrock]     # + AWS Bedrock support
pip install sir-agent[openrouter]  # + OpenRouter support
pip install sir-agent[perplexity]  # + Perplexity support
pip install sir-agent[mistral]     # + Mistral support
pip install sir-agent[all]         # everything
```

Requires Python 3.10+.

## Quick Start

```python
from sir import SIR, tool

@tool
def search_web(query: str) -> str:
    """Search the web."""
    return requests.get(f"https://api.search.com?q={query}").text

@tool
def summarize(text: str) -> str:
    """Summarize text."""
    return text[:200] + "..."

@tool
def translate(text: str, lang: str) -> str:
    """Translate text."""
    return translated_text

sir = SIR(model="qwen2.5:14b")
result = sir.run(
    "Search latest AI news, summarize, and translate to Italian",
    tools=[search_web, summarize, translate],
)
print(result.final_result)
```

One LLM call. Full DAG. Parallel execution. Result.

## How It Works

```
sir.run(prompt, tools)
        |
        v
+----------------------------------+
| 1. Memory Lookup                 |  Semantic vector search in dags.bin
| 2. Prompt Compilation            |  Compressed tool schemas + memory
| 3. Single LLM Call               |  One inference -> full action graph
| 4. Graph Optimization            |  Dead-step elimination, dedup, dep inference
| 5. Parallel Graph Execution      |  Topological sort -> async + speculative
| 6. Reasoning Pass (if needed)    |  LLM reasons on real tool results
| 7. Evolutionary Scoring          |  Score steps, deprecate bad ones
| 8. Memory Persistence            |  Save to dags.bin
+----------------------------------+
```

## Benchmarks

### Overview

<p align="center">
  <img src="https://raw.githubusercontent.com/tictacguy/SIR-Agent/main/docs/static/plots/benchmark_overview.png" alt="Benchmark Overview" width="100%">
</p>

### SIR-Agent vs Chain-of-Tools

SIR adaptively selects only the tools needed. Chain-of-Tools uses a hardcoded pipeline with unnecessary steps.

<p align="center">
  <img src="https://raw.githubusercontent.com/tictacguy/SIR-Agent/main/docs/static/plots/sir_vs_chain.png" alt="SIR vs Chain-of-Tools" width="100%">
</p>

Benchmarked across 5 complexity levels (L1: 2 tools, L5: 11 parallel steps) using the same LLM:

<table width="100%">
<thead>
<tr>
<th align="left" width="40%">Metric</th>
<th align="center" width="30%"><strong>SIR</strong></th>
<th align="center" width="30%">Chain-of-Tools</th>
</tr>
</thead>
<tbody>
<tr><td>Avg Tool Efficiency</td><td align="center"><strong>100%</strong></td><td align="center">71%</td></tr>
<tr><td>Avg Step Efficiency</td><td align="center"><strong>94%</strong></td><td align="center">64%</td></tr>
<tr><td>Total Wasted Tools</td><td align="center"><strong>0</strong></td><td align="center">4</td></tr>
<tr><td>Total Wasted Steps</td><td align="center"><strong>0</strong></td><td align="center">13</td></tr>
<tr><td>Total Tokens</td><td align="center"><strong>5,693</strong></td><td align="center">6,769 (-16%)</td></tr>
<tr><td>Total Wall Time</td><td align="center"><strong>17s</strong></td><td align="center">28s (-40%)</td></tr>
</tbody>
</table>

## Architecture

### Graph Optimization (post-LLM)

After the LLM generates the DAG, SIR runs three compiler passes before execution:

- **Dependency inference** -- adds missing dependencies by analyzing `$sN` references in step arguments
- **Dead-step elimination** -- removes steps whose output is never referenced
- **Duplicate merge** -- merges steps calling the same tool with identical args

### Speculative Execution

While the current layer executes, SIR speculatively launches steps from the next layer if their dependencies are already available. This reduces total wall time on deep DAGs.

### Reasoning Pass

When a task requires understanding, analysis, or synthesis of tool results, SIR automatically activates a reasoning pass. After all tools execute, their results are injected back into the same conversation and the LLM produces a final reasoned answer. The LLM decides during planning whether reasoning is needed (via the `nr` flag in the DAG). For pure data pipelines (fetch, transform, translate), no reasoning pass is triggered and SIR stays at 1 LLM call.

As a fallback, SIR includes a lightweight heuristic that detects reasoning intent in the prompt. This is critical for smaller local models (7B-14B parameters) that may not reliably set the `nr` flag. The heuristic ensures that prompts like "explain", "summarize in your own words", or "what is the sentiment" always trigger the reasoning pass, regardless of model capability. This makes SIR production-ready across the full spectrum of LLMs — from local quantized models to frontier APIs.

### DAG Branching (Multi-Path)

Steps can define `alternatives` -- multiple tool strategies that race in parallel:

```json
{
  "id": "s1",
  "tool": "search",
  "args": {"query": "AI news"},
  "alternatives": [{"tool": "fetch_details", "args": {"entity": "AI"}}],
  "select": "fastest"
}
```

Strategies: `fastest` (first to succeed wins), `shortest`, `longest`.

### Token Compression

SIR uses compressed JSON aliases to minimize token usage:

<table width="100%">
<thead>
<tr>
<th align="left" width="40%">Full key</th>
<th align="center" width="30%">Alias</th>
<th align="center" width="30%">Savings</th>
</tr>
</thead>
<tbody>
<tr><td><code>tool</code></td><td align="center"><code>t</code></td><td align="center">3 tokens</td></tr>
<tr><td><code>args</code></td><td align="center"><code>a</code></td><td align="center">3 tokens</td></tr>
<tr><td><code>depends_on</code></td><td align="center"><code>d</code></td><td align="center">9 tokens</td></tr>
<tr><td><code>condition</code></td><td align="center"><code>c</code></td><td align="center">8 tokens</td></tr>
<tr><td><code>foreach</code></td><td align="center"><code>f</code></td><td align="center">6 tokens</td></tr>
<tr><td><code>final_step</code></td><td align="center"><code>fs</code></td><td align="center">9 tokens</td></tr>
</tbody>
</table>

The parser auto-expands aliases and is fully backward-compatible with full key names.

## Tool Modes

SIR gives developers control over how much autonomy the LLM has in selecting tools:

<table width="100%">
<thead>
<tr>
<th align="left" width="20%">Mode</th>
<th align="left" width="45%">Behavior</th>
<th align="left" width="35%">Use case</th>
</tr>
</thead>
<tbody>
<tr><td><code>adaptive</code> (default)</td><td>LLM picks the minimum tools needed</td><td>Generic prompts, many tools available</td></tr>
<tr><td><code>strict</code></td><td>ALL tools passed must be used; LLM decides order and parallelism only</td><td>Predictable pipelines</td></tr>
<tr><td><code>required</code></td><td>Tools marked <code>required=True</code> are mandatory, others optional</td><td>Mix of fixed and flexible</td></tr>
</tbody>
</table>

```python
# Adaptive -- LLM chooses
sir = SIR(tool_mode="adaptive")

# Strict -- all tools must be used
sir = SIR(tool_mode="strict")

# Required -- mark optional tools
@tool(required=False)
def cache(key: str, value: str) -> str: ...

sir = SIR(tool_mode="required")
```

## Evolutionary Memory

SIR persists every executed action graph in a binary file (`dags.bin`) using msgpack with vector embeddings for semantic retrieval.

```
Run 1: LLM generates plan -> execute -> score -> store in dags.bin
Run 2: Load prior plan -> LLM sees scores/notes -> improves plan -> update
Run 3: Step X scored 2.1 -> DEPRECATED -> LLM replaces with better alternative
Run N: Converges to optimal action graph for this task
```

Each step stores:
- **score** (0-10) -- exponential moving average
- **notes** -- LLM annotations from previous runs
- **executions** -- how many times it ran
- **deprecated** -- true if score falls below threshold after 3 or more runs

## Advanced Features

### Conditional Branching
```json
{"id":"s3","t":"notify","a":{"msg":"$s2.result"},"d":["s2"],
 "c":{"ref":"$s2.result","op":"contains","val":"error"}}
```

### Fan-out (Map-Reduce)
```json
{"id":"s2","t":"process","a":{"item":"$item"},"d":["s1"],"f":"$s1.result"}
```

Supports both `$sN.result` references and inline arrays:
```json
{"id":"s1","t":"search","a":{"query":"$item"},"f":["topic A","topic B"]}
```

### Retry Policy
```json
{"id":"s1","t":"unreliable_api","a":{"url":"..."},"r":3}
```

## Providers

All providers read API keys from environment variables by default. You can also pass them explicitly.

### Ollama (default)
```python
from sir.providers import OllamaProvider
sir = SIR(provider=OllamaProvider(model="qwen2.5:14b"))
```

### OpenAI
```python
from sir.providers import OpenAIProvider
sir = SIR(provider=OpenAIProvider(model="gpt-4o"))  # reads OPENAI_API_KEY
```

### Claude (Anthropic)
```python
from sir.providers import ClaudeProvider
sir = SIR(provider=ClaudeProvider(model="claude-sonnet-4-20250514"))  # reads ANTHROPIC_API_KEY
```

### Gemini (Google)
```python
from sir.providers import GeminiProvider
sir = SIR(provider=GeminiProvider(model="gemini-2.5-flash"))  # reads GEMINI_API_KEY
```

### AWS Bedrock
```python
from sir.providers import BedrockProvider
sir = SIR(provider=BedrockProvider(model="anthropic.claude-sonnet-4-20250514-v1:0"))  # reads AWS_REGION + AWS_BEARER_TOKEN_BEDROCK
```

### OpenRouter
```python
from sir.providers import OpenRouterProvider
sir = SIR(provider=OpenRouterProvider(model="openai/gpt-4o"))  # reads OPENROUTER_API_KEY
```

### Perplexity
```python
from sir.providers import PerplexityProvider
sir = SIR(provider=PerplexityProvider(model="sonar-pro"))  # reads PERPLEXITY_API_KEY
```

### Mistral
```python
from sir.providers import MistralProvider
sir = SIR(provider=MistralProvider(model="mistral-large-latest"))  # reads MISTRAL_API_KEY
```

### Custom Provider
```python
from sir.providers.llm import LLMProvider

class MyProvider(LLMProvider):
    async def generate(self, messages, **kwargs) -> str:
        return await my_custom_llm(messages)
```

## Configuration

```python
sir = SIR(
    provider=OllamaProvider(model="qwen2.5:14b"),
    embed_provider=OpenAIProvider(model="unused", embed_model="text-embedding-3-small"),
    memory_path="dags.bin",           # binary memory file
    enable_memory=True,               # toggle memory system
    enable_optimizer=True,            # toggle graph compression
    enable_speculation=True,          # toggle speculative execution
    enable_reasoning=True,            # toggle reasoning pass
    tool_mode="adaptive",             # "adaptive" | "strict" | "required"
    deprecation_threshold=3.0,        # score below this -> deprecated
    similarity_threshold=0.78,        # semantic memory match threshold
    max_tokens=4096,                  # LLM output limit
    llm_retries=2,                    # retry on LLM/parse failure
)
```

## CLI

```bash
sir run "Search AI news and summarize" -t tools.py
sir run "..." -t tools.py --stream     # live streaming
sir ui                                  # launch web dashboard
sir ui --port 8080                      # custom port
sir inspect                             # view evolutionary memory
sir clear                               # clear memory
```

## Web Dashboard

SIR includes a local web dashboard for real-time DAG visualization, execution monitoring, and memory inspection.

```bash
pip install 'sir-agent[ui]'
sir ui
```

Opens at `http://127.0.0.1:7437`. Features:

- **Playground** -- enter a prompt, select provider/model, and run. View the reasoning output, raw LLM JSON, token count, and whether the task was completed in 1 or 1+1 LLM calls.
- **Evolution explorer** -- browse all stored DAGs with an interactive canvas visualization. Inspect step scores, execution counts, deprecated steps, and score evolution over time.
- **Raw LLM output** -- view the exact JSON the LLM produced for debugging.

### Environment Variables

The dashboard reads credentials from a `.env` file in the working directory. Supported variables:

```bash
# Ollama (default, no key needed)
OLLAMA_HOST=http://localhost:11434

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic Claude
ANTHROPIC_API_KEY=sk-ant-...

# Google Gemini
GEMINI_API_KEY=...

# AWS Bedrock
AWS_REGION=us-east-1
AWS_BEARER_TOKEN_BEDROCK=...

# OpenRouter
OPENROUTER_API_KEY=sk-or-...

# Perplexity
PERPLEXITY_API_KEY=pplx-...

# Mistral
MISTRAL_API_KEY=...
```

You can also enter API keys directly in the dashboard. The chat provider and embedding provider can be configured independently -- for example, use Bedrock for chat and OpenAI for embeddings.

### Tools File

The dashboard loads tools from a Python file. Create a file with `@tool` decorated functions:

```python
# tools.py
from sir import tool

@tool
def search(query: str) -> str:
    """Search the web."""
    return requests.get(f"https://api.example.com?q={query}").text

@tool
def summarize(text: str) -> str:
    """Summarize text."""
    return text[:200]
```

Then enter `tools.py` in the Tools file field.

## DAG Visualization

The following diagram shows an example of a DAG generated by SIR from a single LLM call. Each node represents a tool invocation, and edges represent data dependencies between steps.

<p align="center">
  <img src="https://raw.githubusercontent.com/tictacguy/SIR-Agent/main/docs/static/dag.png" alt="SIR DAG Example" width="100%">
</p>

🌐 For more details visit the [SIR website](https://tictacguy.github.io/SIR/).

## Roadmap

The following features are under active development.

### Cross-Agent DAG Federation

Multiple SIR instances collaborating on a single distributed DAG. Steps can be delegated to specialized agents, with results flowing back into the parent graph. Multi-agent orchestration in a single-shot planning cycle.

### Self-Healing DAGs

When a step fails, SIR generates a targeted micro-DAG in one additional LLM call to repair only the broken branch -- without re-executing the entire graph. A natural extension of evolutionary memory applied to runtime fault recovery.

### Compile-Once, Run-Anywhere DAG Caching

Optimized DAGs are compiled into a binary executable format that no longer requires the LLM. For recurring tasks, SIR becomes a pure runtime with zero inference latency and zero token cost.

## License

AGPL-3.0 -- See [LICENSE](LICENSE) for details.
