Metadata-Version: 2.4
Name: toolfusion
Version: 0.1.0
Summary: Intelligent middleware for AI agent tool orchestration
Project-URL: Homepage, https://github.com/HetanshWaghela/ToolFusion
Project-URL: Repository, https://github.com/HetanshWaghela/ToolFusion
Project-URL: Issues, https://github.com/HetanshWaghela/ToolFusion/issues
Project-URL: Documentation, https://github.com/HetanshWaghela/ToolFusion#readme
Author: ToolFusion Contributors
License: MIT
License-File: LICENSE
Keywords: agents,asyncio,cache,llm,middleware,tooling
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: model2vec>=0.6.0
Requires-Dist: numpy>=1.26
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: rich>=13.7.1
Requires-Dist: typer>=0.12.3
Provides-Extra: all
Requires-Dist: faiss-cpu>=1.8.0; extra == 'all'
Requires-Dist: openai>=1.40.0; extra == 'all'
Requires-Dist: opentelemetry-api>=1.26.0; extra == 'all'
Requires-Dist: opentelemetry-sdk>=1.26.0; extra == 'all'
Requires-Dist: redis[hiredis]>=5.0.0; extra == 'all'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'all'
Requires-Dist: spacy>=3.7.0; extra == 'all'
Requires-Dist: torch>=2.2.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23.7; extra == 'dev'
Requires-Dist: pytest>=8.2.0; extra == 'dev'
Requires-Dist: ruff>=0.5.5; extra == 'dev'
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.8.0; extra == 'faiss'
Provides-Extra: llm
Requires-Dist: openai>=1.40.0; extra == 'llm'
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.26.0; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.26.0; extra == 'otel'
Provides-Extra: redis
Requires-Dist: redis[hiredis]>=5.0.0; extra == 'redis'
Provides-Extra: spacy
Requires-Dist: spacy>=3.7.0; extra == 'spacy'
Provides-Extra: transformers
Requires-Dist: sentence-transformers>=3.0.0; extra == 'transformers'
Requires-Dist: torch>=2.2.0; extra == 'transformers'
Description-Content-Type: text/markdown

<p align="center">
  <h1 align="center">ToolFusion</h1>
  <p align="center"><strong>Stop wasting tokens. Start fusing results.</strong></p>
  <p align="center">
    <a href="https://pypi.org/project/toolfusion/"><img src="https://img.shields.io/pypi/v/toolfusion?color=blue" alt="PyPI"></a>
    <a href="https://pypi.org/project/toolfusion/"><img src="https://img.shields.io/pypi/pyversions/toolfusion" alt="Python"></a>
    <a href="https://github.com/HetanshWaghela/ToolFusion/blob/main/LICENSE"><img src="https://img.shields.io/github/license/HetanshWaghela/ToolFusion" alt="License"></a>
  </p>
</p>

---

**ToolFusion** is async-first middleware that sits between your AI agent framework and your tools. It eliminates redundant tool calls, deduplicates overlapping results, resolves conflicting outputs across tools, and fuses multi-source data — making every agent faster, cheaper, and more reliable.

## The Problem

Every production AI agent system hits the same wall:

```
Agent: "What's the weather in NYC?"
  → Tool A: calls OpenWeatherMap API          ← $0.001, 200ms
  → Tool B: calls WeatherAPI                  ← $0.001, 300ms
  → Tool A again (retry/loop): same API call  ← $0.001, 200ms   ← WASTED
  → Agent context now has 3 overlapping results with conflicting temperatures
  → LLM processes all 3 → extra tokens → higher cost → possible hallucination
```

| Problem | Impact | How ToolFusion Fixes It |
|---|---|---|
| **Duplicate tool calls** — LLMs call the same tool repeatedly with identical/similar params | Wasted API calls, latency, cost | Exact + semantic caching, request coalescing |
| **Redundant results in context** — multiple tools return overlapping info | Token waste, context pollution | Two-stage deduplication (SimHash → semantic) |
| **Conflicting tool outputs** — two tools report different values for the same fact | LLM hallucinates or picks arbitrarily | Source-weighted conflict resolution |
| **Thundering herds** — cache expiry causes simultaneous re-execution storms | Backend overload, spikes | TTL jitter + single-flight coalescing |
| **No visibility** — developers can't see what tools are doing | Silent failures, impossible debugging | Structured telemetry + result envelopes |
| **Framework lock-in** — caching techniques are framework-specific | Can't reuse across LangChain, CrewAI, OpenAI, etc. | Framework-agnostic middleware with adapters |

## Before vs After ToolFusion

```python
# ❌ WITHOUT ToolFusion — every call hits the API, duplicates pile up
results = []
for query in ["python async", "python asynchronous", "async python"]:
    result = await web_search(query)   # 3 API calls for ~same query
    results.append(result)             # 3 overlapping results in context
# Agent processes all 3 → wasted tokens, possible conflicts

# ✅ WITH ToolFusion — duplicates caught, results fused
tf = ToolFusion(preset="balanced")

@tf.tool(cache_mode="semantic_ok", ttl=600)
async def web_search(query: str) -> dict:
    return await call_search_api(query)

r1 = await tf.call("web_search", {"query": "python async"})        # executes
r2 = await tf.call("web_search", {"query": "python async"})        # L1 cache hit (0ms)
r3 = await tf.call("web_search", {"query": "python asynchronous"}) # L2 semantic hit (~1ms)

fused = await tf.fuse([r1, r3])  # deduplicates + merges
# → 1 API call instead of 3, clean unified result, full provenance
```

## Install

```bash
pip install toolfusion
```

Optional extras for production use:

```bash
# Redis cache backend + FAISS vector search
pip install "toolfusion[redis,faiss]"

# Full stack: Redis, FAISS, spaCy NER, sentence-transformers, OpenTelemetry, LLM fusion
pip install "toolfusion[all]"
```

<details>
<summary>Available extras</summary>

| Extra | Packages | Use Case |
|---|---|---|
| `redis` | `redis[hiredis]` | Distributed cache backend + vector search |
| `faiss` | `faiss-cpu` | Fast similarity search for semantic cache |
| `spacy` | `spacy` | Entity extraction for fusion/conflict resolution |
| `transformers` | `sentence-transformers`, `torch` | Higher-accuracy embeddings (`accurate` preset) |
| `otel` | `opentelemetry-api`, `opentelemetry-sdk` | Distributed tracing and metrics |
| `llm` | `openai` | LLM-based fusion strategy |
| `all` | All of the above | Everything |
| `dev` | `pytest`, `pytest-asyncio`, `ruff` | Development/testing |

</details>

## Quick Start

### Async (recommended)

```python
import asyncio
from toolfusion import ToolFusion

async def main():
    async with ToolFusion(preset="balanced") as tf:

        @tf.tool(cache_mode="semantic_ok", ttl=300, freshness="daily")
        async def search(query: str) -> dict:
            # Your actual tool implementation
            return {"results": [f"Result for: {query}"]}

        # First call — executes the tool
        r1 = await tf.call("search", {"query": "python async patterns"})
        print(r1.cache_info.source)  # "miss"

        # Identical call — instant L1 cache hit
        r2 = await tf.call("search", {"query": "python async patterns"})
        print(r2.cache_info.source)  # "l1_cache"

        # Similar call — semantic L2 cache hit
        r3 = await tf.call("search", {"query": "python asynchronous patterns"})
        print(r3.cache_info.source)  # "l2_cache"

asyncio.run(main())
```

### Sync

```python
from toolfusion import ToolFusion

with ToolFusion(preset="fast") as tf:

    @tf.tool(cache_mode="exact_only", ttl=60)
    def calculate(x: int, y: int) -> int:
        return x + y

    r = calculate(2, 3)
    print(r.result)  # 5
```

### Framework Adapters

ToolFusion works with any agent framework:

```python
# LangChain
from toolfusion.adapters import langchain_adapter
wrapped_tools = langchain_adapter.wrap(your_langchain_tools, preset="balanced")

# OpenAI Agents SDK
from toolfusion.adapters import openai_adapter
wrapped_fn = openai_adapter.wrap(your_tool_function, preset="balanced")

# CrewAI
from toolfusion.adapters import crewai_adapter
wrapped_tools = crewai_adapter.wrap(your_crewai_tools, preset="balanced")

# AutoGen
from toolfusion.adapters import autogen_adapter
wrapped_fn = autogen_adapter.wrap(your_autogen_function, preset="balanced")

# MCP (Model Context Protocol)
from toolfusion.adapters import mcp_adapter
wrapped_server = mcp_adapter.wrap(your_mcp_server, preset="balanced")

# Haystack
from toolfusion.adapters import haystack_adapter
wrapped_component = haystack_adapter.wrap(your_component, preset="balanced")
```

## How It Works

```
Agent Tool Call
      │
      ▼
┌─────────────────────────────┐
│  Request Interceptor        │  canonical key + secret redaction
│  ┌───────────────────────┐  │
│  │  L1 Cache (Exact)     │  │  sub-ms hash lookup
│  └───────────┬───────────┘  │
│         miss │              │
│  ┌───────────────────────┐  │
│  │  Single-Flight Gate   │  │  coalesce concurrent duplicate calls
│  └───────────┬───────────┘  │
│  ┌───────────────────────┐  │
│  │  L2 Cache (Semantic)  │  │  embedding similarity search
│  └───────────┬───────────┘  │
│         miss │              │
│  ┌───────────────────────┐  │
│  │  Tool Execution       │  │  actual tool call + circuit breaker
│  └───────────┬───────────┘  │
│  ┌───────────────────────┐  │
│  │  VAAC Admission       │  │  should we cache this result?
│  └───────────┬───────────┘  │
│  ┌───────────────────────┐  │
│  │  Dedup + Fusion       │  │  remove overlaps, resolve conflicts
│  └───────────┬───────────┘  │
│              ▼              │
│  Result Envelope            │  structured output with provenance
└─────────────────────────────┘
```

## Configuration

### Presets

Choose a preset that controls the speed/accuracy tradeoff:

| Preset | Embedder | Semantic Threshold | Dedup | Fusion | Best For |
|---|---|---|---|---|---|
| **`fast`** | Model2Vec | 0.88 | SimHash only | Heuristic | High-throughput, cost-sensitive |
| **`balanced`** | Model2Vec | 0.92 | Hybrid (SimHash→semantic) | Heuristic | General production (default) |
| **`accurate`** | sentence-transformers | 0.95 | Hybrid, conservative | Heuristic + optional LLM | Medical, financial, research |
| **`exact_only`** | None | N/A | Exact hash only | Union | Maximum safety, latency-critical |

```python
tf = ToolFusion(preset="balanced")  # or "fast", "accurate", "exact_only"
```

### Per-Tool Policy

Every tool can override the global preset:

```python
@tf.tool(
    cache_mode="semantic_ok",     # off | exact_only | semantic_ok | semantic_verify
    risk="medium",                # low | medium | high (high → no semantic cache)
    freshness="daily",            # static | daily | realtime | evented
    ttl=600,                      # cache TTL in seconds
    reliability_weight=0.8,       # 0.0–1.0, used in conflict resolution
    cacheable=True,               # False → never cache (for write/mutation tools)
    max_result_size=524288,       # max bytes to cache
    dedup_strategy="hybrid",      # exact | simhash | minhash | semantic | hybrid | none
    volatile_fields=["request_id", "timestamp"],  # stripped before cache key
    depends_on=["other_tool"],    # invalidation cascade
)
async def my_tool(query: str) -> dict:
    ...
```

### YAML Configuration

Generate a config file:

```bash
toolfusion init --preset balanced
```

This creates `toolfusion.yaml` with all settings. See [USAGE.md](USAGE.md) for the full config reference.

```python
tf = ToolFusion(config="toolfusion.yaml")
```

## Result Envelope

Every call returns a `ToolFusionResult` — never a raw value:

```python
result = await tf.call("my_tool", {"query": "test"})

result.result              # The actual tool output
result.cache_info.source   # "miss" | "l1_cache" | "l1_cache_stale" | "l2_cache"
result.cache_info.hit      # True/False
result.latency.total_ms    # End-to-end latency
result.latency.execute_ms  # Time spent in actual tool execution
result.sources             # Provenance: which tools contributed
result.conflicts           # Detected conflicts with resolution details
result.dedup_stats          # How many duplicates were removed
result.errors              # Per-tool errors with retryable flag
result.degraded            # True if result is partial due to failures
result.tokens_saved        # Estimated tokens saved by caching/dedup
result.metadata            # Your custom metadata
```

## Multi-Tool Fusion

When multiple tools return data for the same query, fuse them:

```python
r1 = await tf.call("weather_api_a", {"city": "NYC"})
r2 = await tf.call("weather_api_b", {"city": "NYC"})

fused = await tf.fuse([r1, r2])
print(fused.result)                          # Merged, deduplicated result
print(fused.conflicts)                       # Any conflicting values with resolution
print(fused.dedup_stats.duplicates_removed)  # Overlapping data removed
```

## CLI

```bash
toolfusion doctor              # Check dependencies and runtime config
toolfusion init                # Generate toolfusion.yaml
toolfusion config --interactive  # Interactive config wizard
toolfusion stats               # Runtime statistics
toolfusion stats --live        # Live dashboard
toolfusion bench               # Run benchmark suite
toolfusion bench --compare     # Compare all presets
toolfusion cache inspect       # Inspect cache entries
toolfusion cache clear         # Clear all caches
toolfusion explain <key>       # Explain a specific cache entry
```

## Key Features

<details>
<summary><strong>Single-Flight Request Coalescing</strong></summary>

When 10 concurrent calls request the same tool with the same parameters, only 1 actually executes. The other 9 wait and share the result:

```python
# 10 concurrent identical calls → only 1 execution
results = await asyncio.gather(*[
    tf.call("slow_api", {"q": "test"}) for _ in range(10)
])
# All 10 get the same result, but the API was called only once
```

- Uses `asyncio.shield()` to prevent follower cancellation from killing the leader
- Configurable leader timeout prevents stuck requests
- Background sweeper cleans up orphaned entries

</details>

<details>
<summary><strong>Two-Stage Deduplication</strong></summary>

When multiple results overlap, ToolFusion removes redundancy in two stages:

1. **Stage 1 (Fast):** SimHash/MinHash fingerprinting for O(1) near-duplicate detection
2. **Stage 2 (Precise):** Embedding-based semantic similarity for confirmed duplicates

From each cluster of duplicates, the most informative representative is kept (longest, most entities, most recent).

</details>

<details>
<summary><strong>VAAC Cache Admission</strong></summary>

Not every result is worth caching. VAAC (Value-Aware Admission Control) uses a multi-armed bandit to decide:

- **High-latency tools** → more valuable to cache
- **Stable results** → more cacheable
- **Frequently called tools** → benefit more from caching
- **Write/mutation tools** → never cached (enforced)

</details>

<details>
<summary><strong>Stale-While-Revalidate</strong></summary>

For eligible tools, expired cache entries are served immediately while a background refresh happens:

```python
@tf.tool(freshness="daily", cache_mode="semantic_ok")
async def news_feed(topic: str) -> dict:
    ...
# After TTL expires: serves stale result instantly, refreshes in background
```

</details>

<details>
<summary><strong>Circuit Breaker</strong></summary>

Tools that fail repeatedly are temporarily disabled:

- After 5 consecutive failures → circuit opens (tool calls rejected immediately)
- After 30s recovery timeout → circuit half-opens (one test call allowed)
- On success → circuit closes (normal operation resumes)

</details>

## Project Layout

```
toolfusion/
├── core.py               # Main ToolFusion class
├── cli.py                # CLI commands (typer)
├── bench.py              # Benchmark suite
├── adapters/             # Framework adapters (LangChain, OpenAI, CrewAI, etc.)
├── cache/                # L1/L2 cache backends (memory, SQLite, Redis)
│   ├── backends/         # Cache storage implementations
│   └── vector/           # Vector index implementations (numpy, FAISS, Redis)
├── components/           # Core algorithms
│   ├── admission.py      # VAAC cache admission policy
│   ├── dedup.py          # Two-stage deduplication
│   ├── embedder.py       # Embedding providers (Model2Vec, sentence-transformers)
│   ├── fusion.py         # Cross-tool fusion + conflict resolution
│   └── key_builder.py    # Canonical cache key construction
├── config/               # Configuration system + presets
├── engine/               # Runtime, single-flight, telemetry
├── execution/            # Tool executor + circuit breaker
├── infra/                # Factories, utilities
├── orchestration/        # Pipeline orchestration logic
├── schema/               # Data models, protocols, errors
└── security/             # HMAC signing for cache integrity
```

## Docs

| Document | Description |
|---|---|
| [API Reference](docs/API.md) | Complete API documentation with all parameters |
| [Usage Guide](USAGE.md) | Practical how-to guide with examples |
| [Spec](TOOLFUSION_SPEC.md) | Full technical specification |
| [Changelog](CHANGELOG.md) | Version history |
| [Contributing](CONTRIBUTING.md) | How to contribute |

## Research

ToolFusion's design is informed by peer-reviewed research:

| Paper / Source | Key Finding | ToolFusion Application |
|---|---|---|
| [ToolCaching (arXiv:2601.15335)](https://arxiv.org/abs/2601.15335) | VAAC: 11% higher hit ratio, 34% lower latency vs LRU/LFU | Cache admission engine |
| [GPT Semantic Cache (arXiv:2411.05276)](https://arxiv.org/abs/2411.05276) | 68.8% API call reduction, 97%+ accuracy | L2 semantic cache |
| [Model2Vec (MinishLab)](https://github.com/MinishLab/model2vec) | 500x faster embeddings, ~30MB, numpy-only | Default embedder |
| [Discord singleflight](https://discord.com/blog/how-discord-reduced-websocket-traffic-by-40-percent) | 7.6x RPS improvement | Request coalescing |

## License

[MIT](LICENSE)
