Metadata-Version: 2.4
Name: gnosari-engine
Version: 0.3.0
Summary: A powerful framework for orchestrating multi-agent teams using Large Language Models. Create intelligent AI agent swarms that collaborate through streaming delegation and dynamic tool discovery.
License-File: LICENSE
Author: daviunx
Author-email: david.marsa@neomanex.com
Requires-Python: >=3.12,<=3.14
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: PyJWT (>=2.8.0,<3.0.0)
Requires-Dist: PyYAML (>=6.0,<7.0)
Requires-Dist: aiohttp (>=3.9.0,<4.0.0)
Requires-Dist: aiosqlite (>=0.21.0,<0.22.0)
Requires-Dist: asyncpg (>=0.30.0,<0.31.0)
Requires-Dist: boto3 (>=1.35.0,<2.0.0)
Requires-Dist: celery[redis] (>=5.5.0,<6.0.0)
Requires-Dist: claude-agent-sdk (>=0.1.6,<0.2.0)
Requires-Dist: click (>=8.0.0,<9.0.0)
Requires-Dist: fastmcp (>=2.11.3,<3.0.0)
Requires-Dist: flower (>=2.0.0,<3.0.0)
Requires-Dist: gnosisllm-knowledge (>=0.4.0,<0.5.0)
Requires-Dist: greenlet (>=3.2.4,<4.0.0)
Requires-Dist: httpx (>=0.28.1,<0.29.0)
Requires-Dist: instructor (>=1.0.0,<2.0.0)
Requires-Dist: lxml (>=6.0.1,<7.0.0)
Requires-Dist: openai (>=1.100.1,<2.0.0)
Requires-Dist: openai-agents (>=0.2.10,<0.3.0)
Requires-Dist: opensearch-py (>=3.1.0,<4.0.0)
Requires-Dist: psutil (>=6.1.0,<7.0.0)
Requires-Dist: pymysql (>=1.1.0,<2.0.0)
Requires-Dist: pytest-asyncio (>=1.1.0,<2.0.0)
Requires-Dist: python-dotenv (>=1.0.0,<2.0.0)
Requires-Dist: pytube (>=15.0.0,<16.0.0)
Requires-Dist: redis (>=5.2.0,<6.0.0)
Requires-Dist: requests (>=2.31.0,<3.0.0)
Requires-Dist: requests-aws4auth (>=1.3.0,<2.0.0)
Requires-Dist: rich (>=13.0.0,<14.0.0)
Requires-Dist: sqlalchemy (>=2.0.0,<3.0.0)
Requires-Dist: tiktoken (>=0.5.0,<1.0.0)
Requires-Dist: websockets (>=15.0.1,<16.0.0)
Requires-Dist: youtube-transcript-api (>=1.2.2,<2.0.0)
Description-Content-Type: text/markdown

<div align="center">
  <img src="docs/static/img/logo.png" alt="Gnosari Logo" width="200">

  # Gnosari Engine

  **Core Agent Orchestration Library**

  📚 **[Documentation](https://docs.gnosari.com)**
</div>

The Gnosari Engine is the core orchestration layer for AI agent teams. It provides configuration-driven agent management, tool integration, knowledge retrieval (RAG), and multi-agent coordination.

---

## Table of Contents

1. [Quick Start](#quick-start)
2. [Architecture Overview](#architecture-overview)
3. [Providers](#providers)
4. [YAML Configuration Reference](#yaml-configuration-reference)
5. [CLI Commands](#cli-commands)
6. [Tools System](#tools-system)
7. [Knowledge System (RAG)](#knowledge-system-rag)
8. [Sessions](#sessions)
9. [Structured Output](#structured-output)
10. [Development](#development)
11. [Non-Active Fields Reference](#non-active-fields-reference)

---

## Quick Start

### Prerequisites

- **Python 3.12+**
- **Poetry** for dependency management
- **API Keys** for LLM providers

### Installation

```bash
# Clone and install
git clone https://github.com/neomanex/gnosari-engine.git
cd gnosari-engine
poetry install

# Set up environment
cp .env.example .env
# Edit .env with your API keys
```

### Your First Team

Create `my-team.yaml`:

```yaml
id: "my-team"
name: "My First Team"

agents:
  - id: "coordinator"
    is_orchestrator: true
    model: "gpt-4o"
    instructions: |
      You coordinate tasks and delegate to specialists.
    delegations:
      - target_agent_id: "writer"
        instructions: "Delegate writing tasks"

  - id: "writer"
    model: "gpt-4o"
    instructions: |
      You create clear, engaging content.
```

### Run Your Team

```bash
# Run entire team
poetry run gnosari run my-team.yaml -m "Write a blog post about AI"

# Run specific agent
poetry run gnosari run my-team.yaml -m "Hello" -a coordinator

# With streaming
poetry run gnosari run my-team.yaml -m "Hello" --stream
```

---

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                        CLI / Library                             │
│  gnosari run | gnosari push | gnosari knowledge | gnosari view  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Configuration Service                         │
│         Parse → Validate → Build → Resolve → Index              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Domain Objects                              │
│            Team, Agent, Tool, Knowledge, Trait                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     GnosariRunner                                │
│              Provider autodiscovery + execution                  │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│    OpenAI Provider      │     │    Claude Provider      │
│   (OpenAI Agents SDK)   │     │  (Claude Agent SDK)     │
│    Default provider     │     │  Best for dev/complex   │
└─────────────────────────┘     └─────────────────────────┘
```

### Directory Structure

```
src/gnosari_engine/
├── cli/                    # CLI commands and services
├── config/                 # Configuration loading pipeline
├── factories/              # Domain object factories
├── knowledge/              # Knowledge loading service (uses gnosisllm-knowledge library)
│   ├── services/           # KnowledgeLoaderService
│   └── streaming.py        # Event types for progress reporting
├── prompts/                # Prompt building utilities
├── queue/                  # Celery async task queue (ready, not active)
├── runners/                # Provider strategies (OpenAI, Claude)
├── schemas/                # Pydantic domain models
├── services/               # Business logic services
├── sessions/               # Session persistence
└── tools/                  # Tool implementations (builtin + MCP)
    └── builtin/            # KnowledgeQueryTool, MemoryTool, etc.
```

### Agent Execution Flow

```
┌──────────────────────────────────────────────────────────────────┐
│  1. AgentRun Created                                             │
│     - agent: Agent configuration                                 │
│     - team: Team configuration                                   │
│     - message: User input                                        │
│     - context: ExecutionContext (stream, debug, etc.)           │
│     - metadata: AgentRunMetadata (session_id, account_id, etc.) │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│  2. Prompt Building                                              │
│     AgentPromptBuilder.build_agent_prompt()                      │
│     - Base instructions + Team context + Traits                  │
│     - Knowledge sources + Available tools + Handoffs/delegations │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│  3. OpenAI Agent Creation (OpenAIAgentFactory)                  │
│     Agent(name, instructions, model, tools, handoffs)           │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│  4. Execution                                                    │
│     Runner.run(agent, input, session, context=agent_run)        │
└──────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│  5. Tool Execution                                               │
│     ctx.context IS AgentRun → tools access full context         │
└──────────────────────────────────────────────────────────────────┘
```

---

## Providers

### OpenAI Provider (Default)

- Uses **OpenAI Agents SDK** (`agents` package)
- General-purpose orchestration
- Native support for handoffs, delegations (as tools), MCP servers
- Session persistence via database

### Claude Provider

- Uses **Claude Agent SDK**
- Best suited for development and complex tasks
- Session integration not yet implemented

### Delegation vs Handoffs

| Aspect | Delegations | Handoffs |
|--------|-------------|----------|
| Implementation | `.as_tool()` - Agent becomes a tool | Native `Agent.handoffs` array |
| Context | **Stateless** - doesn't pass conversation context | **Stateful** - transfers full control with context |
| Use Case | Sub-tasks, specialized operations | Complete workflow transfer |
| Return | Returns to calling agent | Does not return automatically |

---

## YAML Configuration Reference

### Team Configuration

```yaml
# Required
id: "my-team"                    # Team identifier (unique)

# Optional
name: "My Team"                  # Display name (auto-generated from id if not set)
description: "Team description"  # Optional description
version: "1.0.0"                # Configuration version
tags: ["production", "sales"]   # Team tags for organization

# Team-level configuration
config:
  max_turns: 25                 # Maximum conversation turns
  timeout: 600                  # Execution timeout in seconds
  debug: false                  # Enable debug mode

# Components
agents: [...]                   # Required - at least one agent
tools: [...]                    # Team-level tools
knowledge: [...]                # Team-level knowledge bases
traits: [...]                   # Team-level traits
```

### Agent Configuration

```yaml
# Required
id: "ceo"                       # Agent identifier (unique within team)
instructions: |                 # Agent base instructions (min 10 chars)
  You are the CEO agent responsible for...

# Optional - Identity
name: "CEO Agent"               # Display name (auto-generated from id)
description: "Chief agent"      # Agent description

# Optional - Model Configuration
model: "gpt-4o"                 # LLM model (default: gpt-4o or env OPENAI_MODEL)

# Optional - Behavior
is_orchestrator: true           # Exactly ONE agent must be orchestrator
max_turns: 10                   # Override team max_turns
debug: false                    # Enable debug mode for this agent

# Optional - Components (reference IDs or inline definitions)
tools: ["api_tool", "mcp_server"]
knowledge: ["company_docs"]
traits: ["helpful", "professional"]

# Optional - Agent Relations
handoffs:                       # Agents to transfer control to
  - target_agent_id: "specialist"
    condition: "When technical expertise needed"
    message: "Transferring to specialist"

delegations:                    # Agents to delegate tasks to
  - target_agent_id: "researcher"
    instructions: "Research the topic thoroughly"

# Optional - Memory
memory:
  content: "Previous context..."

# Optional - Structured Output
structured_output:
  example:
    task_name: "Implement feature"
    priority: "high"
    estimated_hours: 8
  strict: true
```

### Tool Configuration

#### Built-in Tool

```yaml
id: "knowledge_query"
name: "Knowledge Query"
description: "Query knowledge"
module: "gnosari_engine.tools.builtin.knowledge_query"
class_name: "KnowledgeQueryTool"
args:
  some_config: "value"
```

#### MCP Tool (HTTP/SSE)

```yaml
id: "my_mcp_server"
name: "My MCP Server"
url: "http://localhost:8000/mcp"
connection_type: "streamable_http"  # "sse" | "streamable_http" | "stdio"
headers:
  Authorization: "Bearer ${API_KEY}"
timeout: 30
```

#### MCP Tool (Stdio)

```yaml
id: "local_mcp"
name: "Local MCP"
command: "python"
connection_type: "stdio"
args:
  - "-m"
  - "my_mcp_server"
```

### Knowledge Configuration

```yaml
id: "company_docs"
name: "Company Documentation"
description: "Internal docs"
type: "website"                 # website | sitemap | youtube | pdf | text | csv | json | directory
data:
  - "https://docs.company.com"
  - "https://wiki.company.com"
config:
  provider: "opensearch"
  loader_config:
    chunk_size: 6000
    chunk_overlap: 200
```

### Trait Configuration

```yaml
id: "helpful"
name: "Helpful"
description: "Always helpful"
instructions: |
  Always be helpful and supportive.
  Provide clear explanations.
```

### Environment Variable Substitution

Use `${VAR_NAME}` or `${VAR_NAME:default_value}` in YAML:

```yaml
tools:
  - id: "api_tool"
    url: "${API_URL:http://localhost:8000}"
    headers:
      Authorization: "Bearer ${API_KEY}"
```

---

## CLI Commands

### `gnosari run`

Run a team or specific agent.

```bash
# Run entire team
gnosari run teams/my_team.yaml -m "Hello team!"

# Run specific agent
gnosari run teams/my_team.yaml -m "Hello agent!" -a ceo

# With session persistence
gnosari run teams/my_team.yaml -m "Continue discussion" -s session-001

# Options
--stream/--no-stream         # Streaming output (default: enabled)
--debug/--no-debug          # Debug mode
--provider [openai|claude]  # LLM provider (default: openai)
--database-url URL          # Database for sessions
```

### `gnosari view`

View team configuration details.

```bash
gnosari view teams/my_team.yaml              # Tree format (default)
gnosari view teams/my_team.yaml --format json   # JSON format
gnosari view teams/my_team.yaml --format table  # Table format
gnosari view teams/my_team.yaml --format chart  # Flow chart
```

### `gnosari push`

Push team configuration to Gnosari API.

```bash
gnosari push teams/my_team.yaml --api-url https://api.gnosari.com
```

### `gnosari knowledge`

Knowledge CLI provided by `gnosisllm-knowledge` library.

```bash
gnosari knowledge --help              # Show all commands
gnosari knowledge setup               # Setup OpenSearch with ML model
gnosari knowledge load <url>          # Load and index content
gnosari knowledge search "query"      # Search indexed content
gnosari knowledge info                # Show configuration
```

### `gnosari task run`

Execute a task by ID from the database.

```bash
gnosari task run teams/my_team.yaml --task-id 123
gnosari task run teams/my_team.yaml -t 123 --async  # Async via queue
```

### `gnosari start`

Start the queue worker (for async task execution).

```bash
gnosari start
gnosari start --concurrency 8
gnosari start --queue priority_queue
```

---

## Tools System

### Built-in Tools

| Tool | Description | Auto-Injected |
|------|-------------|---------------|
| `KnowledgeQueryTool` | Query knowledge bases (multi-KB, search modes) | Yes, when agent has knowledge |
| `MemoryTool` | Store and recall agent memories | No |
| `GnosariDatabaseTasksTool` | CRUD operations for tasks | No |
| `CodingAgentTool` | Code execution | No |

### Tool Factory

The `AutoDiscoveryToolFactory` automatically discovers tools:

1. Looks up tool by `module` + `class_name`
2. Creates provider-specific wrapper (OpenAI, Claude)
3. Initializes with `args` from configuration
4. Passes `agent_run` for context access

### MCP Integration

MCP (Model Context Protocol) is the preferred way to add custom tools:

```yaml
tools:
  - id: "external_api"
    url: "http://localhost:8000/mcp"
    connection_type: "streamable_http"  # Recommended
    headers:
      Authorization: "Bearer ${TOKEN}"
    timeout: 30
```

**Connection Types:**
- `sse`: Server-Sent Events (legacy)
- `streamable_http`: HTTP with streaming (recommended)
- `stdio`: Local subprocess

### MCP Server Lifecycle

MCP servers require task-isolated lifecycle management:

```python
async def _isolated_producer():
    connected_servers = []
    try:
        for server in mcp_servers:
            await server.connect()
            connected_servers.append(server)
        async for event in result.stream_events():
            await event_queue.put(event)
    finally:
        for server in connected_servers:
            await server.cleanup()
```

---

## Knowledge System (RAG)

### Architecture

The knowledge system uses the **gnosisllm-knowledge** library:

- **Unified API**: Single `Knowledge` class for search and indexing
- **Multi-Knowledge Query**: Query multiple KBs in a single call
- **Multiple Search Modes**: hybrid, semantic, keyword, agentic
- **Multi-Tenancy**: Per-account index isolation via `index_name`

### OpenSearch Integration

Knowledge bases use OpenSearch with hybrid search:

- **Keyword search**: BM25 algorithm
- **Semantic search**: OpenAI embeddings (1536 dimensions)
- **Hybrid scoring**: Combination of both (default mode)
- **Agentic search**: AI-powered reasoning with citations

### Data Loaders

| Type | Description | Data Format |
|------|-------------|-------------|
| `website` | Scrape web pages | URLs |
| `sitemap` | Parse sitemap and scrape | Sitemap URL |
| `discovery` | Crawl and discover URLs | Starting URL |
| `youtube` | Extract transcripts | YouTube URLs |
| `pdf` | Parse PDF documents | File paths or URLs |
| `text` | Plain text files | File paths |
| `csv` | CSV data | File paths |
| `json` | JSON data | File paths |
| `directory` | All files in directory | Directory path |

### Auto-Injection

When an agent has `knowledge` configured, the engine automatically:
1. Adds `knowledge_query` tool to the agent
2. Includes knowledge sources in the system prompt
3. Provides query instructions to the agent

---

## Sessions

### GnosariSession

Sessions persist conversation history:

- **Provider**: `openai_database` (SQL-based)
- **Storage**: PostgreSQL, MySQL, or SQLite
- **Tracking**: session_id, account_id, team_id, agent_id

### Session Configuration

```bash
# Environment variables
SESSION_PROVIDER=file|database|gnosari_api
SESSION_DATABASE_URL=postgresql+asyncpg://user:pass@host:port/db

# Usage
gnosari run team.yaml -m "Hello" -s session-001
gnosari run team.yaml -m "What did I say?" -s session-001  # Remembers context
```

---

## Structured Output

Structured output enables agents to return responses in a defined schema format.

### Configuration Modes

| Mode | Fields | Description |
|------|--------|-------------|
| **Example Only** | `example` | Schema auto-inferred from example types |
| **Schema Only** | `schema` | Explicit JSON Schema definition |
| **Both** | `schema` + `example` | Full control with LLM guidance (recommended) |

### Example Configuration

```yaml
agents:
  - id: "task_analyzer"
    instructions: "Analyze tasks and return structured data"
    is_orchestrator: true
    structured_output:
      example:
        task_name: "Implement feature X"
        priority: "high"
        estimated_hours: 8
        tags: ["backend", "api"]
      strict: true
```

### Type Inference

| YAML Value | Inferred Type |
|------------|---------------|
| `"string"` | `string` |
| `123` | `integer` |
| `12.5` | `number` |
| `true/false` | `boolean` |
| `["a", "b"]` | `array` |
| `{key: val}` | `object` (recursive) |

---

## Development

### Testing

```bash
poetry run pytest                          # Run all tests
poetry run pytest tests/test_specific.py   # Run specific test
poetry run pytest --cov=gnosari_engine     # With coverage
```

### Environment Setup

Required environment variables (see `.env.example`):

```bash
OPENAI_API_KEY=your-key
ANTHROPIC_API_KEY=your-key      # If using Claude
GNOSARI_API_KEY=your-key        # For pushing teams
OPENSEARCH_HOST=localhost       # For knowledge system
OPENSEARCH_PORT=9200
```

### Key Files

| File | Purpose |
|------|---------|
| `runners/gnosari_runner.py` | Main entry point, provider autodiscovery |
| `config/configuration_service.py` | YAML → Domain Objects pipeline |
| `prompts/agent_prompt_builder.py` | System prompt construction |
| `schemas/domain/*.py` | Team, Agent, Tool, Knowledge models |
| `tools/factory.py` | AutoDiscoveryToolFactory |

---

## Non-Active Fields Reference

The following fields are defined in the schema but **not actively used** in execution:

### Agent Fields

| Field | Status | Notes |
|-------|--------|-------|
| `temperature` | **Loaded but NOT passed to OpenAI** | Bug/oversight |
| `reasoning_effort` | Not used | Defined but not implemented |
| `role` | Display + learning only | Not used in execution |
| `listen` | Not implemented | Event system planned |
| `trigger` | Not implemented | Event system planned |

### Delegation Fields

| Field | Status | Notes |
|-------|--------|-------|
| `mode` | Display only | Always uses sync via `.as_tool()` |
| `timeout` | Not used | Delegation timeout not implemented |
| `retry_attempts` | Not used | Retry logic not implemented |

### Tool Fields

| Field | Status | Notes |
|-------|--------|-------|
| `rate_limit` | Not implemented | Rate limiting not implemented |
| `enable_caching` | Not implemented | Response caching not implemented |
| `retry_attempts` | Not implemented | Retry logic not implemented |

### Team Fields

| Field | Status | Notes |
|-------|--------|-------|
| `config.log_level` | Not used | Logging configured via CLI |
| `overrides` | Loaded, not applied | Override system not implemented |
| `account_id` | Display/push only | Not used in execution |

---

## Contributing

We welcome contributions! Please see our contributing guidelines.

## License

Creative Commons Attribution 4.0 International License

---

**Maintenance:** Update this file when architecture, CLI, configuration options, or significant patterns change.

