Metadata-Version: 2.4
Name: observable-agent
Version: 0.1.0
Summary: Unopinionated contract-based verification for AI agents
Author: Kavish Sathia
License: MIT
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: google-adk
Requires-Dist: ddtrace
Requires-Dist: tenacity
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="https://raw.githubusercontent.com/kavishsathia/observable-agent/main/assets/logo.png" alt="Observable Agent" width="600"/>
</p>

<p align="center">
  <strong>Unopinionated contract-based verification for AI agents.</strong>
</p>

<p align="center">
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"></a>
  <a href="https://github.com/kavishsathia/observable-agent/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License: MIT"></a>
  <a href="https://github.com/kavishsathia/observable-agent"><img src="https://img.shields.io/badge/coverage-82%25-brightgreen.svg" alt="Coverage: 82%"></a>
</p>

---

## The Problem

Typical deterministic software uses tools like unit tests to ensure the code functions correctly. However, as we find ourselves becoming more reliant on AI agents to do our work, we will need a smarter and more efficient means of verifying their output is correct. To fix this, this library introduces a mental model known as the Agentic Contract Framework.

Its primary function is to produce a contract with a set of commitments before the agents execution (this contract can be hardcoded, or be dynamically generated by the agent itself). Each commitment on a contract has an attached verifier, and this verifier can be set by you, the developer. If you deem a commitment can be deterministically verified, you are welcome to create a function for that (like a unit test). Otherwise, you can rely on the default semantic verifier that uses another agent to verify the correctness of the output. All the evaluations done will be collected and synced with Datadog.

## Quick Start

```python
from observable_agent import ObservableAgent, Contract, Commitment

# Define what the agent must do
contract = Contract(commitments=[
    Commitment(
        name="no_harmful_content",
        terms="The agent must not produce harmful or offensive content"
    ),
    Commitment(
        name="stay_on_topic",
        terms="The agent must only discuss topics related to the user's query"
    )
])

# Create the agent (wraps Google ADK)
agent = ObservableAgent(
    name="my_agent",
    model="gemini-2.0-flash",
    instruction="You are a helpful assistant.",
    description="A helpful assistant",
    contract=contract,
    on_implementation_complete=lambda verifier: print(verifier.verify())
)
```

## Progressive Hardening

Using this library is very much a process of continuous exploration, you observe your agents, determine their failure modes and progressively "harden" your rules. If you discover that your agent commonly does a certain mistake, you can simply create a commitment to not do that mistake and add a deterministic verifier to help catch it 100% of the times. I would personally recommend just starting with the default semantic verifier and understanding the failure modes of your agent in your domain first!

## Architecture

```mermaid
classDiagram
    ObservableAgent --> Contract
    ObservableAgent --> Execution
    Contract --> Commitment
    Commitment --> Verifier
    Commitment --> SemanticVerifier

    class ObservableAgent {
        +name: str
        +model: str
        +instruction: str
        +contract: Contract
    }

    class Contract {
        +commitments: List~Commitment~
        +verify(execution) List~VerificationResult~
    }

    class Commitment {
        +name: str
        +terms: str
        +verifier: Callable
        +semantic_sampling_rate: float
        +verify(execution) VerificationResult
    }

    class Execution {
        +tool_calls: List~ToolCall~
        +format_tool_calls() str
    }

    class Verifier {
        <<deterministic>>
        +verify(execution, terms) Result
    }

    class SemanticVerifier {
        <<LLM-based>>
        +verify(execution, terms) Result
    }
```

## Key Concepts

The main contribution of this library is the Contract class. A contract stores many commitments. Think of a commitment as an expectation of what the agent is supposed to deliver. And a contract is a set of expectations. It's like a freelancer contract, but with your AI agent.

You build a contract by first defining it and adding commitments to it. A commitment can hold its own verifier. My approach to this is to be as critical as possible towards the output of the AI agent. If your verifier returns a violation, then it is taken that the agent failed to deliver what it committed to. But if your verifier returns a pass, then it is taken that the agent managed to pass a deterministic test case but could potentially have other failure modes that we do not know of (after all, using this library is a process of exploration). In this case, we run it against a semantic verifier to check for these unknown failure modes.

If you are confident that the deterministic test case is enough to account for all failure modes, you can set semantic_sampling_rate to be 0, meaning none of the agent executions for that particular commitment will be put through semantic verification (but your deterministic verification will still run). If you are more cost-conscious, you can set this to some number between 0 and 1 (the lower the number, the lesser the semantic verification that are done and the lesser the cost). If it is 1, then the semantic verifier will always run if (1) your verifier doesn't exist or (2) your verifier returned a pass.

The ObservableAgent function is a wrapper that returns the Agent class from Google ADK. The key differences are the callbacks and the contract. You pass in a contract to the agent, and for the callbacks, you can define a on_tool_call callback and a on_implementation_complete callback (more coming soon!). The ObservableAgent doesn't automatically verify the commitments, instead it provides you a verifier as an argument in the on_implementation_complete callback. Here you can call the .verify() method to start verifying the agent execution. This is so that you can run the verification as a background process, or right after, whichever is best for your context.

## Installation

```bash
pip install observable-agent
```

Set your environment variables:

```bash
# Required for the agent
export GEMINI_API_KEY=your_gemini_api_key

# Required for Datadog observability (optional)
export DD_LLMOBS_ENABLED=1
export DD_LLMOBS_ML_APP=your_app_name
export DD_LLMOBS_AGENTLESS_ENABLED=1
export DD_SITE=us5.datadoghq.com
export DD_API_KEY=your_datadog_api_key
export DD_ENV=development
export DD_SERVICE=observable-agent
```
