Metadata-Version: 2.4
Name: pepsico-document-agents
Version: 0.1.0
Summary: Agent-based extraction framework for transforming parsed documents into structured planogram data
Author-email: PepsiCo <tech@pepsico.com>
Maintainer-email: PepsiCo <tech@pepsico.com>
License: MIT
Project-URL: Homepage, https://github.com/pepsico/document-agents
Project-URL: Documentation, https://github.com/pepsico/document-agents#readme
Project-URL: Repository, https://github.com/pepsico/document-agents.git
Project-URL: Issues, https://github.com/pepsico/document-agents/issues
Keywords: document-processing,planogram-extraction,agent-framework,llm-agents,structured-data
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: plano-core>=0.1.0
Requires-Dist: openai>=1.30.0
Requires-Dist: tiktoken>=0.7.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
Provides-Extra: google
Requires-Dist: google-generativeai>=0.3.0; extra == "google"

# document-agents

A modular agent-based extraction framework for transforming parsed document content into structured planogram data. This library provides a production-ready architecture for building multi-step extraction pipelines with LLM-powered agents.

## Overview

`document-agents` implements a modular agent framework where each agent performs a single extraction or transformation task. Agents communicate through a shared mutable context and are orchestrated through configurable processing chains. The library is designed for enterprise-scale document processing workloads.

## Architecture

The library follows SOLID principles with:

- **Single Responsibility Principle** - Each agent performs exactly one extraction or transformation task
- **Adapter Pattern** - Normalizes provider responses to shared models
- **Factory Pattern** - Creates agent instances by name
- **Dependency Injection** - LLM clients and configurations injected into agents
- **Async-first** - Non-blocking operations for high throughput
- **Protocol-based Interfaces** - Type-safe contracts for agents and LLM clients

## Installation

```bash
pip install document-agents
```

### Optional Dependencies

```bash
# For Anthropic provider
pip install document-agents[anthropic]

# For Google provider
pip install document-agents[google]

# For development
pip install document-agents[dev]
```

## Quick Start

```python
import asyncio
from document_agents import (
    AgentChain,
    AgentConfig,
    AgentContext,
    LLMClient,
    LLMConfig,
    ProcessingMode,
    ShelfReconstructionAgent,
    ProductExtractionAgent,
    SchemaMappingAgent,
)

async def main():
    # Configure LLM client
    llm_config = LLMConfig(
        provider="openai",
        model="gpt-4",
        api_key="your-api-key",
    )
    llm = LLMClient(llm_config)
    
    # Configure agents
    agent_config = AgentConfig(
        retries=3,
        timeout_seconds=120.0,
    )
    
    # Create agents
    agents = {
        "shelf_reconstruction": ShelfReconstructionAgent(llm, agent_config),
        "product_extraction": ProductExtractionAgent(llm, agent_config),
        "schema_mapping": SchemaMappingAgent(llm, agent_config),
    }
    
    # Create chain
    chain = AgentChain(agents)
    
    # Create context with parsed document data
    context = AgentContext(document_id="doc-001")
    context.parse_results = [...]  # Your parsed document results
    
    # Execute chain
    await chain.execute(context, ProcessingMode.STANDARD)
    
    # Get final result
    planogram = context.final_result
    print(f"Extracted {planogram.total_products} products across {planogram.total_shelves} shelves")

asyncio.run(main())
```

## Configuration

### LLM Configuration

```python
from document_agents import LLMConfig

llm_config = LLMConfig(
    provider="openai",  # openai, anthropic, google, llamaapi
    model="gpt-4",
    api_key="your-api-key",
    temperature=0.0,
    max_tokens=4096,
    timeout_seconds=120.0,
    max_retries=3,
    base_url=None,  # Optional custom base URL
)
```

### Agent Configuration

```python
from document_agents import AgentConfig

agent_config = AgentConfig(
    retries=3,
    timeout_seconds=120.0,
    prompt_version="v1",
    enable_logging=True,
    enable_telemetry=True,
    log_prompts=False,
    log_responses=False,
)
```

## Processing Modes

The library supports three processing modes for different complexity levels:

### SIMPLE Mode

For simple documents with minimal structure:

- Product Extraction
- Schema Mapping

### STANDARD Mode

For typical planogram documents:

- Shelf Reconstruction
- Product Extraction
- Schema Mapping

### COMPLEX Mode

For complex multi-page documents with conflicts:

- Shelf Reconstruction
- Product Extraction
- Cross Reference
- Conflict Resolution
- Schema Mapping

## Agents

### Shelf Reconstruction Agent

Identifies shelves, determines numbering, detects bays/sections, and detects shelf continuations across pages.

```python
from document_agents import ShelfReconstructionAgent

agent = ShelfReconstructionAgent(llm, agent_config)
```

### Product Extraction Agent

Extracts product names, UPCs, facings, shelf assignments, and position information from tables, markdown, and inline references.

```python
from document_agents import ProductExtractionAgent

agent = ProductExtractionAgent(llm, agent_config)
```

### Cross Reference Agent

Identifies duplicate products, UPC lookups, table linkage, and page reconciliation.

```python
from document_agents import CrossReferenceAgent

agent = CrossReferenceAgent(llm, agent_config)
```

### Conflict Resolution Agent

Detects conflicting values, resolves conflicts, and preserves audit trails.

```python
from document_agents import ConflictResolutionAgent

agent = ConflictResolutionAgent(llm, agent_config)
```

### Schema Mapping Agent

Maps extracted data to structured planogram schema with proper hierarchy.

```python
from document_agents import SchemaMappingAgent, ProcessingMode

agent = SchemaMappingAgent(llm, agent_config, ProcessingMode.STANDARD)
```

## Agent Chain

The AgentChain orchestrates agent execution in configurable sequences:

```python
from document_agents import AgentChain, ProcessingMode

chain = AgentChain(agents)

# Execute with specific processing mode
await chain.execute(context, ProcessingMode.STANDARD)

# Get chain definition
simple_chain = chain.get_chain(ProcessingMode.SIMPLE)
```

## Custom Agents

Create custom agents by extending BaseAgent:

```python
from document_agents import BaseAgent, AgentContext, AgentConfig, LLMClient

class CustomAgent(BaseAgent):
    def __init__(self, llm: LLMClient, config: AgentConfig):
        super().__init__(
            llm=llm,
            prompt_template="custom.txt",
            config=config,
            agent_name="custom_agent",
        )
    
    def _output_schema(self):
        return {
            "type": "object",
            "properties": {
                "result": {"type": "string"}
            }
        }
    
    def _update_context(self, context: AgentContext, parsed):
        context.metadata["custom_result"] = parsed
```

## Error Handling

The library provides a comprehensive exception hierarchy:

```python
from document_agents import (
    DocumentAgentsError,
    AgentExecutionError,
    LLMError,
    PromptRenderingError,
    SchemaValidationError,
)

try:
    await chain.execute(context)
except AgentExecutionError as e:
    print(f"Agent {e.agent_name} failed: {e.message}")
    print(f"Execution time: {e.metadata['execution_time_ms']}ms")
except LLMError as e:
    print(f"LLM error: {e.message}")
    print(f"Provider: {e.metadata['provider']}")
```

## Telemetry

Agent execution is logged with detailed telemetry:

```python
# Get total execution time
total_time = context.get_total_execution_time_ms()

# Get total token usage
input_tokens, output_tokens = context.get_total_tokens()

# Access individual agent logs
for log in context.agent_logs:
    print(f"{log.agent_name}: {log.execution_time_ms}ms")
    print(f"  Input tokens: {log.input_tokens}")
    print(f"  Output tokens: {log.output_tokens}")
```

## LLM Providers

### Supported Providers

- **OpenAI** - GPT-4, GPT-3.5 Turbo
- **Anthropic** - Claude 3 Opus, Claude 3 Sonnet
- **Google** - Gemini Pro
- **LlamaAPI** - OpenAI-compatible Llama models

### Provider Configuration

```python
# OpenAI
llm_config = LLMConfig(
    provider="openai",
    model="gpt-4",
    api_key="sk-...",
)

# Anthropic
llm_config = LLMConfig(
    provider="anthropic",
    model="claude-3-opus-20240229",
    api_key="sk-ant-...",
)

# Google
llm_config = LLMConfig(
    provider="google",
    model="gemini-pro",
    api_key="AI...",
)

# LlamaAPI
llm_config = LLMConfig(
    provider="llamaapi",
    model="llama-2-70b-chat",
    api_key="your-api-key",
)
```

## Development

### Running Tests

```bash
# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=document_agents
```

### Code Style

```bash
# Format code
black document_agents

# Lint code
ruff check document_agents

# Type check
mypy document_agents
```

## Design Principles

1. **Single Responsibility** - Each agent performs one task
2. **Dependency Injection** - All dependencies injected via constructors
3. **Protocol-based** - Type-safe interfaces using Protocol
4. **Async-first** - All operations are async for performance
5. **Type-safe** - Full type hints with Pydantic validation
6. **Extensible** - Easy to add new agents and providers
7. **Testable** - Mock-friendly design with clear interfaces

## Dependencies

- `plano-core` - Shared interfaces and models
- `openai>=1.30` - OpenAI API client
- `tiktoken>=0.7` - Token counting
- `jinja2>=3.1` - Prompt templating
- `pydantic>=2.0` - Data validation

### Optional

- `anthropic>=0.18` - Anthropic API client
- `google-generativeai>=0.3` - Google API client

## License

MIT

## Support

For issues, questions, or contributions, please visit the project repository.
