Metadata-Version: 2.4
Name: omd
Version: 0.2.0
Summary: One More Dimension — lightweight LLM agent framework with tool-calling, skills, and file tools
Project-URL: Homepage, https://github.com/mrYush/omd
Project-URL: Repository, https://github.com/mrYush/omd
Project-URL: Issues, https://github.com/mrYush/omd/issues
Project-URL: Changelog, https://github.com/mrYush/omd/blob/main/CHANGELOG.md
Author: mrYush
License: MIT
License-File: LICENSE
Keywords: agent,llm,openai,pydantic,react,skills,tool-calling
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.12
Requires-Dist: requests>=2.32
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.11; extra == 'dev'
Provides-Extra: machinery
Requires-Dist: minio-client; extra == 'machinery'
Requires-Dist: pillow>=10; extra == 'machinery'
Description-Content-Type: text/markdown

# OMD — One More Dimension

A lightweight Python framework for building LLM agents with tool-calling support.
OMD provides a unified OpenAI-compatible client, a ReAct-style agentic loop, and
automatic JSON schema generation from plain Python functions.

## Features

- **Unified LLM client** — `BaseClient` abstraction for any OpenAI-compatible
  HTTP API; swap providers by implementing a single method.
- **ReAct agent** — `BaseAgent` runs an iterative tool-calling loop with
  configurable iteration limits and automatic final-answer forcing.
- **Tool system** — `Tool.from_callable()` converts a typed Python function
  (with Sphinx-style docstring) into an OpenAI tool definition automatically.
- **Skills** — drop a `SKILL.md` into a directory and the agent can pick it up
  automatically (LLM router) or load it on demand via `load_skill` tool.
- **File tools** — line-precise `read_fragment` / `read_lines` /
  `insert_lines` / `delete_lines` with optional workspace-root sandbox.
- **Machinery integrations** (optional) — ready-made functions for
  Stability AI image generation, Gemini reference-guided images, and
  Veo video generation.
- **Pydantic models** — `Model` and `Tool` are Pydantic v2 models with
  strict validation and clean serialization.

## Installation

From PyPI (once published):

```bash
pip install omd
# or
uv add omd
```

From source:

```bash
git clone https://github.com/mrYush/omd.git
cd omd
uv pip install -e .
```

## Quick Start

### 1. Call an LLM

```python
from omd.clients import ApiBarClient, Model

client = ApiBarClient(
    model=Model(
        name="gpt-4",
        url="https://api.openai.com/v1/chat/completions",
        api_token="sk-...",
    )
)

# or, if MODEL_NAME / MODEL_URL / MODEL_API_TOKEN are set in the environment:
client = ApiBarClient(model=Model.from_env())

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
]

response = client.call_model(messages=messages)
```

### 2. Define Tools from Functions

`Tool.from_callable()` extracts the function name, docstring, and type hints
to build an OpenAI-compatible JSON schema:

```python
from omd.tools import Tool


def search(query: str, top_k: int = 5) -> list[str]:
    """Search the web for relevant information.

    :param query: Search query.
    :param top_k: Number of results to return.
    :returns: List of search results.
    """
    return [f"result for {query}"]


tool = Tool.from_callable(search)
```

### 3. Run an Agent

`BaseAgent` implements a ReAct-style agentic loop — it sends messages to the
LLM, executes any requested tool calls, appends the results, and repeats until
the model produces a final text answer or the iteration limit is reached.

```python
from omd.agents import BaseAgent
from omd.clients import ApiBarClient, Model
from omd.tools import Tool


def search(query: str) -> str:
    """Search the web for information.

    :param query: Search query.
    :returns: Search results.
    """
    return f"Results for: {query}"


def create_image(prompt: str) -> str:
    """Generate an image from a text description.

    :param prompt: Image description.
    :returns: Path to generated image.
    """
    return f"/images/{prompt.replace(' ', '_')}.png"


client = ApiBarClient(
    model=Model(
        name="gpt-4",
        url="https://api.openai.com/v1/chat/completions",
        api_token="sk-...",
    )
)

agent = BaseAgent(
    client=client,
    tools=[
        Tool.from_callable(search),
        Tool.from_callable(create_image),
    ],
    tool_choice="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Search for 'Python async' and create an image of a snake"},
]

new_messages = agent.run(messages=messages, attempts_limit=10)
```

### 4. Manual Tool-Calling

If you need lower-level control, pass raw tool definitions directly to the
client:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "search",
            "description": "Search the web.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
]

response = client.call_model(
    messages=messages,
    tools=tools,
    tool_choice="auto",
    parallel_tool_calls=True,
)
```

## Agent Architecture

`BaseAgent` implements the **ReAct** (Reasoning + Acting) pattern:

```
┌─────────────────────────────────────────────────────────────┐
│                       BaseAgent.run()                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  1. Send messages + tools → LLM                             │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  LLM response   │
                    └─────────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
    ┌──────────────────┐           ┌──────────────────┐
    │  Has tool_calls  │           │  Final answer    │
    └──────────────────┘           └──────────────────┘
              │                               │
              ▼                               ▼
┌──────────────────────────┐       ┌──────────────────┐
│ 2. Execute each tool     │       │ Return new       │
│    tool.invoke(args)     │       │ messages         │
└──────────────────────────┘       └──────────────────┘
              │
              ▼
┌──────────────────────────┐
│ 3. Append results        │
│    to messages           │
└──────────────────────────┘
              │
              ▼
┌──────────────────────────┐
│ 4. Loop back to step 1   │
└──────────────────────────┘
```

| Component | Role |
|-----------|------|
| `BaseClient` | Abstract HTTP call to an LLM API |
| `Tool` | Python function wrapper with JSON schema |
| `BaseAgent` | Agentic loop orchestrator |
| `messages` | Conversation history (mutated in-place) |

The `attempts_limit` parameter controls the maximum number of iterations.
On the final iteration tools are disabled (`tools=None`) to force the LLM
to produce a text answer and prevent infinite tool-call loops.

## Skills

A **skill** is a folder containing a `SKILL.md` file with YAML-like
front-matter and a free-form Markdown body. The body explains how a particular
role uses the available tools — it is appended to the system prompt only when
the skill is selected.

### SKILL.md format

```markdown
---
name: file-editor
description: Edit text files by line range. Use when the user asks to read fragments or modify specific lines.
recommended_tools: [read_fragment, read_lines, insert_lines, delete_lines]
version: 0.1.0
---
# File Editor

Workflow:
1. Read the target range first.
2. Insert / delete with line-precise tools.
3. Re-read to verify.
```

### Discovery

`SkillRegistry` discovers skills from three sources:

1. Built-in skills shipped with the package (`omd/skills/builtin/`):
   `file-editor`, `code-reviewer`, `test-writer`. Disable via
   `include_builtin=False`.
2. Directories listed in the `OMD_SKILLS_PATH` environment variable
   (`os.pathsep`-separated). Disable via `read_env=False`.
3. Directories passed to the `dirs=` constructor argument (highest precedence).

### Hybrid selection

The agent can use skills in two complementary ways:

- **Auto-router** — a small LLM call before the main loop picks the most
  relevant skills based on the user's request and `name + description`
  metadata; their bodies are prepended to the system prompt.
- **On-demand tools** — `list_skills` and `load_skill` tools let the agent
  itself fetch a skill body mid-conversation when the router missed.

```python
from omd.agents import BaseAgent
from omd.clients import ApiBarClient, Model
from omd.skills import SkillRegistry, SkillRouter
from omd.tools import make_file_tools

client = ApiBarClient(model=Model.from_env())
registry = SkillRegistry(dirs=["./my-skills"])

agent = BaseAgent(
    client=client,
    tools=make_file_tools(workspace_root="./project"),
    skills=registry,
    skill_router=SkillRouter(client, max_skills=2),
    auto_skill_routing=True,    # default — pre-select skills via router
    expose_skill_tools=True,    # default — also expose list_skills/load_skill
)

agent.run(
    messages=[{"role": "user", "content": "Refactor utils.py and add tests"}],
    attempts_limit=10,
)
```

## File tools

`omd.tools.make_file_tools()` returns a ready-to-use set of line-precise
file manipulation tools:

| Tool | Purpose |
|------|---------|
| `read_fragment(path, offset=1, limit=200)` | Read up to `limit` lines starting at `offset` |
| `read_lines(path, start, end)` | Read the inclusive 1-indexed line range |
| `insert_lines(path, line, content)` | Insert `content` *before* the given line |
| `delete_lines(path, start, end)` | Delete the inclusive 1-indexed range |

All paths are 1-indexed and the responses include `LINE|content` prefixes so
the model never has to count lines manually. Pass
`workspace_root=Path("./project")` to refuse any path that resolves outside
that directory; omit it for an unrestricted toolset.

```python
from omd.tools import make_file_tools

tools = make_file_tools(workspace_root="./project")
```

The standalone callables (`omd.tools.read_fragment`, `read_lines`,
`insert_lines`, `delete_lines`) are also exported for direct use without an
agent.

## Machinery (Optional Integrations)

The `omd.machinery` subpackage provides ready-made functions for external
media generation services. These are **optional** — the core agent/client/tool
system works without them.

### Stability AI — Image Generation

```python
from omd.machinery import generate_image

url = generate_image(
    prompt="A serene mountain landscape at sunrise",
    aspect_ratio="16:9",
    output_format="png",
)
```

Image-to-image with strength control:

```python
url = generate_image(
    prompt="Transform into a watercolor painting",
    image_path="/path/to/input.png",
    strength=0.5,
)
```

### Stability AI — Background Replacement

```python
from omd.machinery import replace_background

url = replace_background(
    subject_image_path="/path/to/person.png",
    background_prompt="modern office with large windows and city view",
    light_source_direction="right",
)

url = replace_background(
    subject_image_path="/path/to/person.png",
    background_reference_path="/path/to/beach_bg.jpg",
    preserve_original_subject=0.9,
)
```

### Gemini — Reference-Guided Image Generation

```python
from omd.machinery import generate_image_gemini

url = generate_image_gemini(
    prompt="A portrait in the style of the reference",
    style_reference_image="/path/to/reference.png",
)
```

### Veo — Video Generation

`generate_video_veo` produces short videos (4-8 seconds) with native audio via
Google Veo API. Supports text-to-video and image-to-video.

Under the hood the function:

1. Submits a `predictLongRunning` request to the Gemini API (via api-bar)
2. Polls the operation every 10 seconds until completion
3. Downloads the generated MP4 video
4. Uploads it to MinIO and returns a presigned URL

```python
from omd.machinery import generate_video_veo

# Text-to-video
url = generate_video_veo(
    prompt="A drone shot of a sunset over the ocean, cinematic, warm tones",
)

# Portrait video, Full HD
url = generate_video_veo(
    prompt='Close-up of a barista pouring latte art. She says "Almost perfect".',
    aspect_ratio="9:16",
    resolution="1080p",
    duration_seconds="8",
)

# Image-to-video (first frame from an image)
url = generate_video_veo(
    prompt="The cat slowly opens its eyes and stretches",
    reference_image="https://example.com/sleeping_cat.png",
)
```

#### Veo Model Selection

| Model | When to use |
|-------|-------------|
| `veo-3.0-fast-generate-001` | **Default.** Veo 3.0, cost-effective |
| `veo-3.0-generate-001` | Veo 3.0, baseline quality |
| `veo-3.1-generate-preview` | Preview 3.1: highest quality |
| `veo-3.1-fast-generate-preview` | Preview 3.1: faster than full 3.1 |
| `veo-3.1-lite-generate-preview` | Preview 3.1: lightweight/cheapest in the 3.1 line |

Override per call with `model="..."` or change the default via the
`_VEO_DEFAULT_MODEL` constant in `omd.machinery.veo`.

#### Veo Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `prompt` | *(required)* | Text description. Use quotes for dialogue, explicit words for sounds |
| `reference_image` | `None` | Starting frame: file path, URL, or data URL |
| `aspect_ratio` | `"16:9"` | `"16:9"` (landscape) or `"9:16"` (portrait) |
| `resolution` | `"720p"` | `"720p"`, `"1080p"`, or `"4k"` |
| `duration_seconds` | `"8"` | `"4"`, `"6"`, or `"8"` (1080p/4k require `"8"`) |
| `model` | `"veo-3.0-fast-generate-001"` | See model table above |
| `timeout_s` | `600.0` | Max wait time for generation (seconds) |

#### Prompt Tips

- **Composition**: subject, action, style, camera motion, ambiance
- **Dialogue**: `A man says "Let's go!" and grabs his coat`
- **Sound**: `thunder rumbling, rain hitting the window`
- **Style**: `cinematic`, `anime`, `stop-motion`, `film noir`
- **Camera**: `dolly shot`, `aerial view`, `close-up`, `POV shot`

## Configuration

### Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `MODEL_URL` | Chat completions endpoint URL | Yes (for client) |
| `MODEL_NAME` | Model name (e.g. `gpt-4`, `llama-3`) | Yes (for client) |
| `MODEL_API_TOKEN` | API key for authorization | Yes (for client) |
| `STABILITY_API_KEY` | Stability AI API key | For `omd.machinery.sd` |
| `STABILITY_BASE_URL` | Stability AI base URL | For `omd.machinery.sd` |
| `API_BAR_TOKEN` | Api-bar gateway token | For `omd.machinery.gemini` / `veo` |
| `MINIO_*` | MinIO connection settings | For media upload (`image_utils`, `veo`) |
| `MINIO_PRESIGNED_URL_EXPIRES_IN` | Presigned URL TTL in seconds (default: 3600) | No |

## Project Structure

```
src/omd/
├── __init__.py
├── agents/
│   ├── __init__.py
│   └── base.py             # BaseAgent — ReAct-style agentic loop with skills
├── clients/
│   ├── __init__.py
│   ├── base.py             # BaseClient — abstract interface
│   ├── apibar.py           # ApiBarClient — OpenAI-compatible implementation
│   └── data_models.py      # Model — Pydantic client config (with from_env)
├── tools/
│   ├── __init__.py
│   ├── data_models.py      # Tool — auto-generated JSON schemas from functions
│   └── file_tools.py       # read_fragment / read_lines / insert / delete
├── skills/
│   ├── __init__.py
│   ├── data_models.py      # Skill — Pydantic skill record
│   ├── parser.py           # SKILL.md front-matter parser
│   ├── registry.py         # SkillRegistry — discovery and lookup
│   ├── router.py           # SkillRouter — LLM-driven selection
│   ├── tools.py            # list_skills / load_skill agent tools
│   └── builtin/            # file-editor, code-reviewer, test-writer
├── machinery/              # Optional: Stability AI / Gemini / Veo
├── prompts/
│   └── base.py             # PromptBase, ListPrompt — system prompt builders
└── utils/
    └── logging_utils.py    # truncate_text, serialize_for_log
```

`Model.from_env()` reads `MODEL_NAME`, `MODEL_URL`, and `MODEL_API_TOKEN`
lazily — `import omd` no longer requires any environment variables.

## Development

### Setup

```bash
git clone https://github.com/mrYush/omd.git
cd omd
uv sync --all-extras
```

### Running Tests

```bash
uv run pytest
```

### Linting

```bash
uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
```

## Roadmap

- Payload normalization across providers (`max_tokens` vs `max_completion_tokens`, etc.)
- Legacy tool-calling schema support (`functions` / `function_call`)
- Retry logic with exponential backoff
- Streaming responses
- Additional client implementations (Anthropic, Google AI, etc.)
- Richer skill metadata (capabilities, dependencies, examples) and a CLI to
  scaffold/validate skill packages

## License

[MIT](LICENSE) — Copyright (c) 2026 mrYush
