Metadata-Version: 2.4
Name: omd
Version: 0.3.0
Summary: One More Dimension — lightweight LLM agent framework with tool-calling, streaming, skills, and file tools
Project-URL: Homepage, https://github.com/mrYush/omd
Project-URL: Repository, https://github.com/mrYush/omd
Project-URL: Issues, https://github.com/mrYush/omd/issues
Project-URL: Changelog, https://github.com/mrYush/omd/blob/main/CHANGELOG.md
Author: mrYush
License: MIT
License-File: LICENSE
Keywords: agent,async,llm,openai,pydantic,react,skills,streaming,tool-calling
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.12
Requires-Dist: requests>=2.32
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.11; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: machinery
Requires-Dist: minio-client; extra == 'machinery'
Requires-Dist: pillow>=10; extra == 'machinery'
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == 'openai'
Description-Content-Type: text/markdown

# OMD — One More Dimension

A lightweight Python framework for building LLM agents with tool-calling support.
OMD provides a unified OpenAI-compatible client, a ReAct-style agentic loop,
streaming event emission, and automatic JSON schema generation from plain Python functions.

## Features

- **Unified LLM client** — `BaseClient` abstraction for any OpenAI-compatible
  HTTP API; swap providers by implementing a single method.
- **OpenAI SDK client** — `OpenAISDKClient` uses the official `openai` Python SDK
  with custom `base_url` support for Perplexity, Together, Azure, and other
  OpenAI-compatible endpoints.
- **ReAct agent** — `BaseAgent` runs an iterative tool-calling loop with
  configurable iteration limits and automatic final-answer forcing.
- **Streaming agents** — `StreamingAgent` and `AsyncStreamingAgent` yield
  fine-grained events (content deltas, tool calls, results) allowing real-time
  streaming to clients even when the LLM provider doesn't natively support it.
- **Terminating agent** — `TerminatingAgent` exits the loop as soon as a
  designated "submit" tool is called, useful for structured output workflows.
- **Tool system** — `Tool.from_callable()` converts a typed Python function
  (with Sphinx-style docstring) into an OpenAI tool definition automatically.
  `tool_from_schema()` creates tools from raw OpenAI schemas supporting
  `enum`, nullable types, and nested objects.
- **Async tool bridge** — `AsyncToolBridge` wraps synchronous tools for use
  in async streaming agents without blocking the event loop.
- **Skills** — drop a `SKILL.md` into a directory and the agent can pick it up
  automatically (LLM router) or load it on demand via `load_skill` tool.
- **File tools** — line-precise `read_fragment` / `read_lines` /
  `insert_lines` / `delete_lines` with optional workspace-root sandbox.
- **Machinery integrations** (optional) — ready-made functions for
  Stability AI image generation, Gemini reference-guided images, and
  Veo video generation.
- **Pydantic models** — `Model` and `Tool` are Pydantic v2 models with
  strict validation and clean serialization.

## Installation

From PyPI (once published):

```bash
pip install omd
# or
uv add omd
```

From source:

```bash
git clone https://github.com/mrYush/omd.git
cd omd
uv pip install -e .
```

## Quick Start

### 1. Call an LLM

```python
from omd.clients import ApiBarClient, Model

client = ApiBarClient(
    model=Model(
        name="gpt-4",
        url="https://api.openai.com/v1/chat/completions",
        api_token="sk-...",
    )
)

# or, if MODEL_NAME / MODEL_URL / MODEL_API_TOKEN are set in the environment:
client = ApiBarClient(model=Model.from_env())

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
]

response = client.call_model(messages=messages)
```

### 2. Define Tools from Functions

`Tool.from_callable()` extracts the function name, docstring, and type hints
to build an OpenAI-compatible JSON schema:

```python
from omd.tools import Tool


def search(query: str, top_k: int = 5) -> list[str]:
    """Search the web for relevant information.

    :param query: Search query.
    :param top_k: Number of results to return.
    :returns: List of search results.
    """
    return [f"result for {query}"]


tool = Tool.from_callable(search)
```

### 3. Run an Agent

`BaseAgent` implements a ReAct-style agentic loop — it sends messages to the
LLM, executes any requested tool calls, appends the results, and repeats until
the model produces a final text answer or the iteration limit is reached.

```python
from omd.agents import BaseAgent
from omd.clients import ApiBarClient, Model
from omd.tools import Tool


def search(query: str) -> str:
    """Search the web for information.

    :param query: Search query.
    :returns: Search results.
    """
    return f"Results for: {query}"


def create_image(prompt: str) -> str:
    """Generate an image from a text description.

    :param prompt: Image description.
    :returns: Path to generated image.
    """
    return f"/images/{prompt.replace(' ', '_')}.png"


client = ApiBarClient(
    model=Model(
        name="gpt-4",
        url="https://api.openai.com/v1/chat/completions",
        api_token="sk-...",
    )
)

agent = BaseAgent(
    client=client,
    tools=[
        Tool.from_callable(search),
        Tool.from_callable(create_image),
    ],
    tool_choice="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Search for 'Python async' and create an image of a snake"},
]

new_messages = agent.run(messages=messages, attempts_limit=10)
```

### 4. Manual Tool-Calling

If you need lower-level control, pass raw tool definitions directly to the
client:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "search",
            "description": "Search the web.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
]

response = client.call_model(
    messages=messages,
    tools=tools,
    tool_choice="auto",
    parallel_tool_calls=True,
)
```

## Agent Architecture

`BaseAgent` implements the **ReAct** (Reasoning + Acting) pattern:

```
┌─────────────────────────────────────────────────────────────┐
│                       BaseAgent.run()                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  1. Send messages + tools → LLM                             │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  LLM response   │
                    └─────────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
    ┌──────────────────┐           ┌──────────────────┐
    │  Has tool_calls  │           │  Final answer    │
    └──────────────────┘           └──────────────────┘
              │                               │
              ▼                               ▼
┌──────────────────────────┐       ┌──────────────────┐
│ 2. Execute each tool     │       │ Return new       │
│    tool.invoke(args)     │       │ messages         │
└──────────────────────────┘       └──────────────────┘
              │
              ▼
┌──────────────────────────┐
│ 3. Append results        │
│    to messages           │
└──────────────────────────┘
              │
              ▼
┌──────────────────────────┐
│ 4. Loop back to step 1   │
└──────────────────────────┘
```

| Component | Role |
|-----------|------|
| `BaseClient` | Abstract HTTP call to an LLM API |
| `Tool` | Python function wrapper with JSON schema |
| `BaseAgent` | Agentic loop orchestrator |
| `messages` | Conversation history (mutated in-place) |

The `attempts_limit` parameter controls the maximum number of iterations.
On the final iteration tools are disabled (`tools=None`) to force the LLM
to produce a text answer and prevent infinite tool-call loops.

## Streaming Agents

`StreamingAgent` and `AsyncStreamingAgent` emit fine-grained events during
the tool-calling loop, allowing real-time streaming to clients even when
the LLM provider doesn't natively support streaming.

```python
from omd.agents import AsyncStreamingAgent
from omd.streaming import OwuiSseFormatter
from openai import AsyncOpenAI

# Build async tools (using AsyncToolBridge for sync tools)
from omd.tools import AsyncToolBridge

bridge = AsyncToolBridge()
async_tools = bridge.wrap_many({
    "search": search_function,
    "calculate": calculate_function,
})

# Create async OpenAI client
client = AsyncOpenAI(api_key="sk-...", base_url="https://api.openai.com/v1")

agent = AsyncStreamingAgent(
    client=client,
    tools=async_tools,
    tool_schemas=[tool.model_dump() for tool in tools],
    system_prompt="You are a helpful assistant.",
    model="gpt-4o",
)

# Stream events
async for event in agent.run_stream(messages):
    print(event)
```

### Streaming Events

| Event | Description |
|-------|-------------|
| `ContentDeltaEvent` | Incremental text or reasoning content |
| `ToolCallStartEvent` | Tool call initiated by LLM |
| `ToolResultEvent` | Tool execution result |
| `DoneEvent` | Stream completed |
| `ErrorEvent` | Error occurred |

### SSE Formatting

Format events for Open WebUI or other SSE-compatible clients:

```python
from omd.streaming import OwuiSseFormatter

formatter = OwuiSseFormatter(max_tool_result_chars=2000)

async for event in agent.run_stream(messages):
    sse_line = formatter.format(event)
    if sse_line:
        yield f"data: {sse_line}\n\n"

# Final terminator
yield f"data: {formatter.finalize()}\n\n"
```

## OpenAI SDK Client

`OpenAISDKClient` uses the official `openai` Python SDK, providing
first-class support for `base_url` customization needed for Perplexity,
Together, Azure, and other OpenAI-compatible endpoints.

```python
from omd.clients import OpenAISDKClient, Model

client = OpenAISDKClient(
    model=Model(
        name="gpt-4o",
        url="https://api.openai.com/v1",  # base_url without /chat/completions
        api_token="sk-...",
    ),
    temperature=0.1,
)

# Use with any agent
agent = BaseAgent(client=client, tools=[...])
```

Install with the `openai` extra:

```bash
pip install omd[openai]
```

## Terminating Agent

`TerminatingAgent` exits the loop as soon as a designated "submit" tool
is called, useful for workflows that always end with a structured tool call.

```python
from omd.agents import TerminatingAgent, SubmittedSignal
from omd.tools import tool_from_schema

# Define a submit tool that raises SubmittedSignal
SUBMIT_SCHEMA = {
    "type": "function",
    "function": {
        "name": "submit_answer",
        "parameters": {
            "type": "object",
            "properties": {
                "answer": {"type": "string"},
                "confidence": {"type": "number"},
            },
            "required": ["answer"],
        },
    },
}

def submit_answer(**kwargs):
    raise SubmittedSignal(kwargs)

submit_tool = tool_from_schema(SUBMIT_SCHEMA, submit_answer)

# Create terminating agent
agent = TerminatingAgent(
    client=client,
    tools=[search_tool, submit_tool],
)

agent.run(messages)

# Access the submitted result
if agent.submission:
    print(f"Answer: {agent.submission['answer']}")
```

## Advanced Tools

### Tools from Raw Schema

`tool_from_schema` creates a `Tool` from a raw OpenAI schema, supporting
advanced features like `enum`, nullable types, and nested objects:

```python
from omd.tools import tool_from_schema

SCHEMA = {
    "type": "function",
    "function": {
        "name": "search_products",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "category": {
                    "type": "string",
                    "enum": ["electronics", "books", "clothing"],
                },
                "max_price": {
                    "type": ["number", "null"],  # Nullable
                },
            },
            "required": ["query"],
        },
    },
}

tool = tool_from_schema(SCHEMA, search_products)
```

### Async Tool Bridge

Wrap synchronous tools for async streaming agents:

```python
from omd.tools import AsyncToolBridge

bridge = AsyncToolBridge()

# Wrap single tool
async_search = bridge.wrap("search", sync_search)

# Wrap multiple tools
async_tools = bridge.wrap_many({
    "search": sync_search,
    "calculate": sync_calculate,
    "fetch_data": sync_fetch,
})
```

## Skills

A **skill** is a folder containing a `SKILL.md` file with YAML-like
front-matter and a free-form Markdown body. The body explains how a particular
role uses the available tools — it is appended to the system prompt only when
the skill is selected.

### SKILL.md format

```markdown
---
name: file-editor
description: Edit text files by line range. Use when the user asks to read fragments or modify specific lines.
recommended_tools: [read_fragment, read_lines, insert_lines, delete_lines]
version: 0.1.0
---
# File Editor

Workflow:
1. Read the target range first.
2. Insert / delete with line-precise tools.
3. Re-read to verify.
```

### Discovery

`SkillRegistry` discovers skills from three sources:

1. Built-in skills shipped with the package (`omd/skills/builtin/`):
   `file-editor`, `code-reviewer`, `test-writer`. Disable via
   `include_builtin=False`.
2. Directories listed in the `OMD_SKILLS_PATH` environment variable
   (`os.pathsep`-separated). Disable via `read_env=False`.
3. Directories passed to the `dirs=` constructor argument (highest precedence).

### Hybrid selection

The agent can use skills in two complementary ways:

- **Auto-router** — a small LLM call before the main loop picks the most
  relevant skills based on the user's request and `name + description`
  metadata; their bodies are prepended to the system prompt.
- **On-demand tools** — `list_skills` and `load_skill` tools let the agent
  itself fetch a skill body mid-conversation when the router missed.

```python
from omd.agents import BaseAgent
from omd.clients import ApiBarClient, Model
from omd.skills import SkillRegistry, SkillRouter
from omd.tools import make_file_tools

client = ApiBarClient(model=Model.from_env())
registry = SkillRegistry(dirs=["./my-skills"])

agent = BaseAgent(
    client=client,
    tools=make_file_tools(workspace_root="./project"),
    skills=registry,
    skill_router=SkillRouter(client, max_skills=2),
    auto_skill_routing=True,    # default — pre-select skills via router
    expose_skill_tools=True,    # default — also expose list_skills/load_skill
)

agent.run(
    messages=[{"role": "user", "content": "Refactor utils.py and add tests"}],
    attempts_limit=10,
)
```

## File tools

`omd.tools.make_file_tools()` returns a ready-to-use set of line-precise
file manipulation tools:

| Tool | Purpose |
|------|---------|
| `read_fragment(path, offset=1, limit=200)` | Read up to `limit` lines starting at `offset` |
| `read_lines(path, start, end)` | Read the inclusive 1-indexed line range |
| `insert_lines(path, line, content)` | Insert `content` *before* the given line |
| `delete_lines(path, start, end)` | Delete the inclusive 1-indexed range |

All paths are 1-indexed and the responses include `LINE|content` prefixes so
the model never has to count lines manually. Pass
`workspace_root=Path("./project")` to refuse any path that resolves outside
that directory; omit it for an unrestricted toolset.

```python
from omd.tools import make_file_tools

tools = make_file_tools(workspace_root="./project")
```

The standalone callables (`omd.tools.read_fragment`, `read_lines`,
`insert_lines`, `delete_lines`) are also exported for direct use without an
agent.

## Machinery (Optional Integrations)

The `omd.machinery` subpackage provides ready-made functions for external
media generation services. These are **optional** — the core agent/client/tool
system works without them.

### Stability AI — Image Generation

```python
from omd.machinery import generate_image

url = generate_image(
    prompt="A serene mountain landscape at sunrise",
    aspect_ratio="16:9",
    output_format="png",
)
```

Image-to-image with strength control:

```python
url = generate_image(
    prompt="Transform into a watercolor painting",
    image_path="/path/to/input.png",
    strength=0.5,
)
```

### Stability AI — Background Replacement

```python
from omd.machinery import replace_background

url = replace_background(
    subject_image_path="/path/to/person.png",
    background_prompt="modern office with large windows and city view",
    light_source_direction="right",
)

url = replace_background(
    subject_image_path="/path/to/person.png",
    background_reference_path="/path/to/beach_bg.jpg",
    preserve_original_subject=0.9,
)
```

### Gemini — Reference-Guided Image Generation

```python
from omd.machinery import generate_image_gemini

url = generate_image_gemini(
    prompt="A portrait in the style of the reference",
    style_reference_image="/path/to/reference.png",
)
```

### Veo — Video Generation

`generate_video_veo` produces short videos (4-8 seconds) with native audio via
Google Veo API. Supports text-to-video and image-to-video.

Under the hood the function:

1. Submits a `predictLongRunning` request to the Gemini API (via api-bar)
2. Polls the operation every 10 seconds until completion
3. Downloads the generated MP4 video
4. Uploads it to MinIO and returns a presigned URL

```python
from omd.machinery import generate_video_veo

# Text-to-video
url = generate_video_veo(
    prompt="A drone shot of a sunset over the ocean, cinematic, warm tones",
)

# Portrait video, Full HD
url = generate_video_veo(
    prompt='Close-up of a barista pouring latte art. She says "Almost perfect".',
    aspect_ratio="9:16",
    resolution="1080p",
    duration_seconds="8",
)

# Image-to-video (first frame from an image)
url = generate_video_veo(
    prompt="The cat slowly opens its eyes and stretches",
    reference_image="https://example.com/sleeping_cat.png",
)
```

#### Veo Model Selection

| Model | When to use |
|-------|-------------|
| `veo-3.0-fast-generate-001` | **Default.** Veo 3.0, cost-effective |
| `veo-3.0-generate-001` | Veo 3.0, baseline quality |
| `veo-3.1-generate-preview` | Preview 3.1: highest quality |
| `veo-3.1-fast-generate-preview` | Preview 3.1: faster than full 3.1 |
| `veo-3.1-lite-generate-preview` | Preview 3.1: lightweight/cheapest in the 3.1 line |

Override per call with `model="..."` or change the default via the
`_VEO_DEFAULT_MODEL` constant in `omd.machinery.veo`.

#### Veo Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `prompt` | *(required)* | Text description. Use quotes for dialogue, explicit words for sounds |
| `reference_image` | `None` | Starting frame: file path, URL, or data URL |
| `aspect_ratio` | `"16:9"` | `"16:9"` (landscape) or `"9:16"` (portrait) |
| `resolution` | `"720p"` | `"720p"`, `"1080p"`, or `"4k"` |
| `duration_seconds` | `"8"` | `"4"`, `"6"`, or `"8"` (1080p/4k require `"8"`) |
| `model` | `"veo-3.0-fast-generate-001"` | See model table above |
| `timeout_s` | `600.0` | Max wait time for generation (seconds) |

#### Prompt Tips

- **Composition**: subject, action, style, camera motion, ambiance
- **Dialogue**: `A man says "Let's go!" and grabs his coat`
- **Sound**: `thunder rumbling, rain hitting the window`
- **Style**: `cinematic`, `anime`, `stop-motion`, `film noir`
- **Camera**: `dolly shot`, `aerial view`, `close-up`, `POV shot`

## Configuration

### Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `MODEL_URL` | Chat completions endpoint URL | Yes (for client) |
| `MODEL_NAME` | Model name (e.g. `gpt-4`, `llama-3`) | Yes (for client) |
| `MODEL_API_TOKEN` | API key for authorization | Yes (for client) |
| `STABILITY_API_KEY` | Stability AI API key | For `omd.machinery.sd` |
| `STABILITY_BASE_URL` | Stability AI base URL | For `omd.machinery.sd` |
| `API_BAR_TOKEN` | Api-bar gateway token | For `omd.machinery.gemini` / `veo` |
| `MINIO_*` | MinIO connection settings | For media upload (`image_utils`, `veo`) |
| `MINIO_PRESIGNED_URL_EXPIRES_IN` | Presigned URL TTL in seconds (default: 3600) | No |

## Project Structure

```
src/omd/
├── __init__.py
├── agents/
│   ├── __init__.py
│   ├── base.py             # BaseAgent — ReAct-style agentic loop with skills
│   ├── streaming.py        # StreamingAgent, AsyncStreamingAgent
│   └── terminating.py      # TerminatingAgent, SubmittedSignal
├── clients/
│   ├── __init__.py
│   ├── base.py             # BaseClient — abstract interface
│   ├── apibar.py           # ApiBarClient — raw HTTP implementation
│   ├── openai_sdk.py       # OpenAISDKClient — official SDK wrapper
│   └── data_models.py      # Model — Pydantic client config (with from_env)
├── streaming/              # Streaming utilities
│   ├── __init__.py
│   ├── events.py         # StreamEvent dataclasses
│   └── sse_formatter.py  # SSE formatting for Open WebUI
├── tools/
│   ├── __init__.py
│   ├── data_models.py      # Tool — auto-generated JSON schemas from functions
│   ├── file_tools.py       # read_fragment / read_lines / insert / delete
│   ├── raw_schema.py       # tool_from_schema for raw OpenAI schemas
│   └── async_bridge.py     # AsyncToolBridge for sync→async wrapping
├── skills/
│   ├── __init__.py
│   ├── data_models.py      # Skill — Pydantic skill record
│   ├── parser.py           # SKILL.md front-matter parser
│   ├── registry.py         # SkillRegistry — discovery and lookup
│   ├── router.py           # SkillRouter — LLM-driven selection
│   ├── tools.py            # list_skills / load_skill agent tools
│   └── builtin/            # file-editor, code-reviewer, test-writer
├── machinery/              # Optional: Stability AI / Gemini / Veo
├── prompts/
│   └── base.py             # PromptBase, ListPrompt — system prompt builders
└── utils/
    └── logging_utils.py    # truncate_text, serialize_for_log
```

`Model.from_env()` reads `MODEL_NAME`, `MODEL_URL`, and `MODEL_API_TOKEN`
lazily — `import omd` no longer requires any environment variables.

## Development

### Setup

```bash
git clone https://github.com/mrYush/omd.git
cd omd
uv sync --all-extras
```

### Running Tests

```bash
uv run pytest
```

### Linting

```bash
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
```

## Roadmap

- Payload normalization across providers (`max_tokens` vs `max_completion_tokens`, etc.)
- Legacy tool-calling schema support (`functions` / `function_call`)
- Retry logic with exponential backoff
- Native streaming support in BaseClient implementations
- Additional client implementations (Anthropic, Google AI, etc.)
- Richer skill metadata (capabilities, dependencies, examples) and a CLI to
  scaffold/validate skill packages
- Multi-agent orchestration and communication patterns

## License

[MIT](LICENSE) — Copyright (c) 2026 mrYush
