Metadata-Version: 2.4
Name: agents-builder
Version: 4.0.0
Summary: 
Author: jalal
Author-email: jalalkhaldi3@gmail.com
Requires-Python: >=3.12,<3.14
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: arize-phoenix-client (>=2.6.0,<3.0.0)
Requires-Dist: arize-phoenix-otel (==0.13.1)
Requires-Dist: langchain (>=1.1.0,<2.0.0)
Requires-Dist: langchain-mcp-adapters (>=0.2.1,<0.3.0)
Requires-Dist: langchain-ollama (>=1.0.0,<2.0.0)
Requires-Dist: langchain-openrouter (>=0.0.2,<0.0.3)
Requires-Dist: openevals (>=0.2.0,<1.0.0)
Requires-Dist: pydantic (>=2.12.5,<3.0.0)
Requires-Dist: pydantic-settings (>=2.12.0,<3.0.0)
Description-Content-Type: text/markdown

# agents-builder

`agents-builder` is a small Python framework for building retrieval-aware agents on top of `LangGraph`, `LangChain`, and MCP tools.

The package gives you a few focused primitives:

- A base `Agent` that owns graph construction, LLM factories, and MCP tool loading.
- Config-driven runtime objects via Pydantic settings models.
- Reusable `Prompt`, `LLMNode`, `ToolBasedNode`, and `RouterNode` building blocks.
- An agent evaluation API for running scenarios, scoring trajectories, and annotating traces.
- An agent evaluation API for running scenarios, scoring trajectories, and annotating traces.
- Lightweight support for multiple LLM backends such as Ollama and OpenRouter.

The design is intentionally narrow: compose agents from small graph nodes, keep configuration explicit, and avoid framework-heavy abstractions that hide behavior.

## What This Project Contains

Core package layout:

```text
src/agents_builder/
├── __init__.py          # Base Agent
├── constants.py         # Shared typing aliases and config path
├── exceptions.py        # Domain-specific exceptions
├── mixins.py            # Config loading helpers
├── settings.py          # Pydantic settings models
├── utils.py             # Dynamic loading, MCP caching, message helpers
├── eval/
│   ├── __init__.py      # AgentEvalRunner, TraceAnnotator, assertions, scoring
│   ├── annotator.py     # Phoenix trace annotation integration
│   ├── assertion.py     # Reusable trajectory assertions
│   └── trajectory.py    # EvalScenario and AgentTrajectory models
├── eval/
│   ├── __init__.py      # AgentEvalRunner, TraceAnnotator, assertions, scoring
│   ├── annotator.py     # Phoenix trace annotation integration
│   ├── assertion.py     # Reusable trajectory assertions
│   └── trajectory.py    # EvalScenario and AgentTrajectory models
├── llm/
│   ├── __init__.py
│   ├── factory.py       # LLMFactory and role-based model lookup
│   └── llm.py           # OllamaLLM / OpenRouterLLM implementations
└── langgraph/
    ├── __init__.py      # Prompt, Node, LLMNode, DeletionStrategy
    ├── nodes.py         # Ready-made retrieval/answering node types
    └── states.py        # Typed graph state
```

Tests are split by intent:

- `tests/unit`: isolated behavior for factories, nodes, mixins, utils, and errors.
- `tests/integration`: config loading and caching behavior.
- `tests/e2e`: a full retrieval-answer graph wired together end to end.

## Installation

This project uses `uv` and Python `3.12`.

```bash
uv sync --group dev --all-extras
```

Common local commands:

```bash
make lint
make type-check
make test
make test-cov
make ci
```

## Core Concepts

### 1. Agent

Subclass `agents_builder.Agent` and implement `build_graph()`.

The base class already handles:

- config storage
- lazy graph compilation
- LLM factory initialization
- MCP tool loading with caching
- graph export via `push()`

### 2. Settings-first runtime objects

Most runtime classes follow the same pattern:

- define a Pydantic settings model
- inherit from `FromConfigMixin[...]`
- construct the runtime object from validated config

This keeps runtime behavior strongly tied to validated configuration instead of ad hoc dictionaries.

### 3. Role-based LLM selection

`LLMFactory` maps roles like `query`, `grader`, or `answer` to specific model configs.

That makes it easy to:

- use different models for different steps
- swap providers without changing graph code
- keep model selection in config rather than inside node logic

### 4. LangGraph node building blocks

The package exposes low-level but reusable graph parts:

- `Prompt`: render a system prompt from state
- `SchemaBasedPrompt`: prompt plus structured output schema
- `LLMNode`: invoke an LLM against rendered messages
- `ToolBasedNode`: base for MCP-backed tool nodes
- `RouterNode`: return route keys for conditional edges
- `DeletionStrategy`: trim message history in a controlled way

### 5. Agent evaluation API

Use `agents_builder.eval` to run repeatable evaluation scenarios against a compiled agent graph.

The core flow is:

- define an `EvalScenario` with the user query and metadata
- run the graph with `AgentEvalRunner`
- convert graph messages into an `AgentTrajectory`
- score the trajectory with `SuccessAssertion` and `EfficiencyAssertion` checks
- emit trace annotations through a `TraceAnnotator`

```python
from agents_builder.eval import (
    AgentTrajectory,
    EfficiencyAssertion,
    AgentEvalRunner,
    EvalScenario,
    SuccessAssertion,
    TraceAnnotator,
    TrajectoryScorer,
)


class RequiresAnswer(SuccessAssertion):
    def check(self, trajectory: AgentTrajectory) -> bool:
        passed = bool(trajectory.steps) and bool(trajectory.steps[-1].action.content)
        self.logger.annotate(
            annotation_name="answered",
            annotator_kind="CODE",
            label="pass" if passed else "fail",
            score=1.0 if passed else 0.0,
            explanation=trajectory.serialize(),
        )
        if not passed:
            raise AssertionError("agent did not produce a final answer")
        return passed


class LimitsStepCount(EfficiencyAssertion):
    def __init__(self, logger: TraceAnnotator, max_steps: int) -> None:
        super().__init__(logger)
        self.max_steps = max_steps

    def check(self, trajectory: AgentTrajectory) -> bool:
        passed = len(trajectory.steps) <= self.max_steps
        self.logger.annotate(
            annotation_name="step_count",
            annotator_kind="CODE",
            label="pass" if passed else "fail",
            score=1.0 if passed else 0.0,
            metadata={"max_steps": self.max_steps, "actual_steps": len(trajectory.steps)},
        )
        return passed


scenario = EvalScenario(
    name="contract lookup",
    query="Find the renewal terms",
    category="retrieval",
    tier="smoke",
    metadata={"dataset": "contracts"},
)

logger = ...
runner = AgentEvalRunner(graph=agent.graph, logger=logger)
scorer = TrajectoryScorer().success(RequiresAnswer(logger)).expect(LimitsStepCount(logger, max_steps=3))
trajectory = await runner.run(scenario, scorer)
```

For LLM-as-judge checks, use `LLMJudgeAssertion` from `agents_builder.eval.assertion`. Its prompt must contain exactly `{query}` and `{trajectory}` placeholders, and each criterion is emitted as an LLM trace annotation.

Phoenix-backed annotation is available through `PhoenixTraceAnnotator` with `PhoenixTraceAnnotatorSettings`:

```python
from agents_builder.eval.annotator import PhoenixTraceAnnotator
from agents_builder.settings import PhoenixTraceAnnotatorSettings

logger = PhoenixTraceAnnotator(
    PhoenixTraceAnnotatorSettings(
        module_path="agents_builder.eval.annotator.PhoenixTraceAnnotator",
        base_url="http://localhost:6006",
        project_name="agents-builder",
    )
)
```

## Configuration

Configuration is built around Pydantic settings models in `src/agents_builder/settings.py`.

Important settings types:

- `AgentSettings`
- `LLMFactorySettings`
- `OllamaLLMSettings`
- `OpenRouterLLMSettings`
- `MCPServerSettings`
- `TraceAnnotatorSettings`
- `PhoenixTraceAnnotatorSettings`
- `TraceAnnotatorSettings`
- `PhoenixTraceAnnotatorSettings`

The package-level default YAML path is:

```text
/config/config.yaml
```

You can also construct objects directly from Python with `.from_config(...)` or load from YAML with `.from_yaml(...)`.

## Development Workflow

Quality checks already configured in the project:

- `ruff` for linting and formatting
- `ty` for type checking
- `pytest` for unit, integration, and e2e tests
- `bandit` for security checks

Recommended local loop:

```bash
make lint
make type-check
make test
```

Before merging broader changes:

```bash
make ci
```

## Design Principles

This repository works best when changes stay aligned with a few constraints:

- Prefer explicit graph nodes over magic orchestration layers.
- Keep configuration typed and validated.
- Use small runtime classes with narrow responsibilities.
- Let tests describe behavior at the node and graph level.

If you are contributing code or using AI coding agents in this repository, read `AGENTS.md` next.

