Metadata-Version: 2.4
Name: narrative-ai-framework
Version: 0.3.0
Summary: AI-powered voice diary framework: STT, TTS, LLM, RAG, and voice-agent engines
Author: Narrative AI Team
License: MIT
Keywords: ai,voice,diary,stt,tts,llm,rag,arabic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML>=6.0.1
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: aiofiles>=23.2.1
Requires-Dist: requests>=2.28.0
Requires-Dist: shortuuid>=1.0.11
Requires-Dist: pyngrok>=7.0.0
Requires-Dist: nest-asyncio>=1.5.8
Requires-Dist: sympy>=1.12
Provides-Extra: stt
Requires-Dist: soundfile>=0.12.1; extra == "stt"
Requires-Dist: scipy>=1.11.0; extra == "stt"
Requires-Dist: webrtcvad>=2.0.10; extra == "stt"
Requires-Dist: numpy>=1.24.0; extra == "stt"
Requires-Dist: elevenlabs>=0.2.0; extra == "stt"
Requires-Dist: yt-dlp>=2023.11.0; extra == "stt"
Requires-Dist: pydub>=0.25.1; extra == "stt"
Requires-Dist: transformers>=4.36.0; extra == "stt"
Requires-Dist: accelerate>=0.25.0; extra == "stt"
Requires-Dist: torch>=2.1.0; extra == "stt"
Requires-Dist: ctranslate2>=4.0.0; extra == "stt"
Requires-Dist: faster-whisper>=1.0.0; extra == "stt"
Provides-Extra: tts
Requires-Dist: aiohttp>=3.9.0; extra == "tts"
Requires-Dist: numpy>=1.24.0; extra == "tts"
Provides-Extra: ocr
Requires-Dist: opencv-python>=4.8.0; extra == "ocr"
Requires-Dist: scikit-image>=0.21.0; extra == "ocr"
Requires-Dist: pdf2image>=1.16.3; extra == "ocr"
Requires-Dist: python-docx>=1.1.0; extra == "ocr"
Requires-Dist: einops>=0.6.1; extra == "ocr"
Requires-Dist: torch>=2.0.1; extra == "ocr"
Requires-Dist: torchvision>=0.15.2; extra == "ocr"
Requires-Dist: transformers>=4.45.0; extra == "ocr"
Requires-Dist: accelerate>=0.26.0; extra == "ocr"
Requires-Dist: qwen-vl-utils>=0.0.4; extra == "ocr"
Requires-Dist: timm>=0.9.2; extra == "ocr"
Requires-Dist: basicsr>=1.4.2; extra == "ocr"
Requires-Dist: realesrgan>=0.3.0; extra == "ocr"
Provides-Extra: llm
Requires-Dist: google-generativeai>=0.3.0; extra == "llm"
Requires-Dist: google-genai>=0.3.0; extra == "llm"
Requires-Dist: openai>=1.3.0; extra == "llm"
Requires-Dist: anthropic>=0.18.0; extra == "llm"
Requires-Dist: tiktoken>=0.5.0; extra == "llm"
Provides-Extra: voice
Requires-Dist: livekit>=0.11.0; extra == "voice"
Requires-Dist: livekit-api>=0.4.0; extra == "voice"
Requires-Dist: livekit-agents>=0.7.0; extra == "voice"
Requires-Dist: livekit-plugins-silero>=0.6.0; extra == "voice"
Requires-Dist: livekit-plugins-elevenlabs>=1.3.0; extra == "voice"
Requires-Dist: livekit-plugins-turn-detector>=1.3.0; extra == "voice"
Requires-Dist: livekit-plugins-noise-cancellation>=0.2.0; extra == "voice"
Requires-Dist: sounddevice>=0.5.0; extra == "voice"
Provides-Extra: db
Requires-Dist: SQLAlchemy>=2.0.0; extra == "db"
Requires-Dist: asyncpg>=0.29.0; extra == "db"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "db"
Requires-Dist: alembic>=1.13.0; extra == "db"
Requires-Dist: redis>=5.0.0; extra == "db"
Provides-Extra: security
Requires-Dist: redis>=5.0.0; extra == "security"
Requires-Dist: SQLAlchemy>=2.0.0; extra == "security"
Requires-Dist: cryptography>=41.0.0; extra == "security"
Requires-Dist: PyJWT>=2.8.0; extra == "security"
Requires-Dist: bcrypt>=4.0.0; extra == "security"
Provides-Extra: api
Requires-Dist: fastapi>=0.109.0; extra == "api"
Requires-Dist: uvicorn[standard]>=0.27.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: email-validator>=2.1.0; extra == "api"
Provides-Extra: rag
Requires-Dist: sentence-transformers>=2.2.2; extra == "rag"
Requires-Dist: FlagEmbedding>=1.3.5; extra == "rag"
Requires-Dist: pillow>=10.0.0; extra == "rag"
Requires-Dist: psutil>=5.9.0; extra == "rag"
Requires-Dist: unstructured[all-docs]>=0.10.0; extra == "rag"
Requires-Dist: python-magic>=0.4.27; extra == "rag"
Requires-Dist: pytesseract>=0.3.10; extra == "rag"
Requires-Dist: pgvector>=0.2.5; extra == "rag"
Requires-Dist: qdrant-client>=1.7.0; extra == "rag"
Provides-Extra: web
Requires-Dist: ddgs>=9.0.0; extra == "web"
Provides-Extra: vlm
Requires-Dist: pillow>=10.0.0; extra == "vlm"
Requires-Dist: numpy>=1.24.0; extra == "vlm"
Requires-Dist: ollama>=0.1.0; extra == "vlm"
Provides-Extra: all
Requires-Dist: soundfile>=0.12.1; extra == "all"
Requires-Dist: scipy>=1.11.0; extra == "all"
Requires-Dist: webrtcvad>=2.0.10; extra == "all"
Requires-Dist: numpy>=1.24.0; extra == "all"
Requires-Dist: elevenlabs>=0.2.0; extra == "all"
Requires-Dist: yt-dlp>=2023.11.0; extra == "all"
Requires-Dist: pydub>=0.25.1; extra == "all"
Requires-Dist: transformers>=4.36.0; extra == "all"
Requires-Dist: accelerate>=0.25.0; extra == "all"
Requires-Dist: torch>=2.1.0; extra == "all"
Requires-Dist: ctranslate2>=4.0.0; extra == "all"
Requires-Dist: faster-whisper>=1.0.0; extra == "all"
Requires-Dist: aiohttp>=3.9.0; extra == "all"
Requires-Dist: google-generativeai>=0.3.0; extra == "all"
Requires-Dist: google-genai>=0.3.0; extra == "all"
Requires-Dist: openai>=1.3.0; extra == "all"
Requires-Dist: anthropic>=0.18.0; extra == "all"
Requires-Dist: tiktoken>=0.5.0; extra == "all"
Requires-Dist: livekit>=0.11.0; extra == "all"
Requires-Dist: livekit-api>=0.4.0; extra == "all"
Requires-Dist: livekit-agents>=0.7.0; extra == "all"
Requires-Dist: livekit-plugins-silero>=0.6.0; extra == "all"
Requires-Dist: livekit-plugins-elevenlabs>=1.3.0; extra == "all"
Requires-Dist: livekit-plugins-turn-detector>=1.3.0; extra == "all"
Requires-Dist: livekit-plugins-noise-cancellation>=0.2.0; extra == "all"
Requires-Dist: sounddevice>=0.5.0; extra == "all"
Requires-Dist: SQLAlchemy>=2.0.0; extra == "all"
Requires-Dist: asyncpg>=0.29.0; extra == "all"
Requires-Dist: psycopg2-binary>=2.9.0; extra == "all"
Requires-Dist: alembic>=1.13.0; extra == "all"
Requires-Dist: redis>=5.0.0; extra == "all"
Requires-Dist: cryptography>=41.0.0; extra == "all"
Requires-Dist: PyJWT>=2.8.0; extra == "all"
Requires-Dist: bcrypt>=4.0.0; extra == "all"
Requires-Dist: fastapi>=0.109.0; extra == "all"
Requires-Dist: uvicorn[standard]>=0.27.0; extra == "all"
Requires-Dist: python-multipart>=0.0.6; extra == "all"
Requires-Dist: email-validator>=2.1.0; extra == "all"
Requires-Dist: ddgs>=9.0.0; extra == "all"
Requires-Dist: opencv-python>=4.8.0; extra == "all"
Requires-Dist: scikit-image>=0.21.0; extra == "all"
Requires-Dist: pdf2image>=1.16.3; extra == "all"
Requires-Dist: python-docx>=1.1.0; extra == "all"
Requires-Dist: einops>=0.6.1; extra == "all"
Requires-Dist: qwen-vl-utils>=0.0.4; extra == "all"
Requires-Dist: pgvector>=0.2.5; extra == "all"
Requires-Dist: qdrant-client>=1.7.0; extra == "all"
Requires-Dist: ollama>=0.1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: httpx>=0.25.0; extra == "dev"
Requires-Dist: mypy>=1.7.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Dynamic: license-file

# Narrative AI SDK (v0.3.0)

---

## 🔑 LLM Engine (`nai.llm`)

### `generate()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Primary high-level interface for synchronous text generation.<br>• Supports multiple state-of-the-art providers (OpenAI, Gemini, Anthropic).<br>• Handles prompt formatting, model routing, and error management automatically.<br>• Provides structured `LLMResponse` objects containing usage metadata and finish reasons. | `prompt` (str), `model` (str), `max_tokens` (int) | `LLMResponse` |
```python
import narrative_ai as nai
import asyncio
llm = nai.llm

async def main():
    # Set the API key and specify the provider
    llm.set_api_key("sk-...", provider="openai")
    # Call generate with the prompt string
    response = await llm.generate(prompt="Hello", model="gpt-4")
    # Print the resulting text
    print(response.text)

asyncio.run(main())
```

### `generate_stream()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Facilitates real-time, token-by-token text generation streaming.<br>• Optimized for chat-like interfaces requiring immediate visual feedback.<br>• Uses asynchronous iterators to reduce peak memory usage and transmission latency.<br>• Automatically manages chunk reassembly and partial response handling. | `prompt` (str), `model` (str) | `AsyncIterator` |
```python
import narrative_ai as nai
import asyncio
llm = nai.llm

async def main():
    llm.set_api_key("key", provider="openai")
    async for chunk in llm.generate_stream(prompt="Hi"):
        print(chunk, end="", flush=True)

asyncio.run(main())
```

### `set_api_key()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures the global authentication credentials for a specific provider.<br>• Allows for dynamic provider switching at runtime without re-initializing the engine.<br>• Validates key format and presence before making any network requests.<br>• Securely stores credentials within the active engine session. | `api_key` (str), `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.llm.set_api_key("key", provider="openai")
```

### `set_llm_provider()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Sets the active LLM engine globally across the entire framework session.<br>• Enables seamless transitions between models like GPT-4 and Claude 3.<br>• Updates internal routing logic to point subsequent calls to the new provider.<br>• Ensures that model-specific parameters are correctly mapped during switching. | `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.llm.set_llm_provider("gemini")
```

### `set_service_url()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Overrides the default provider endpoint with a custom base URL.<br>• Critical for connecting to private LLM proxies or local inference servers.<br>• Supports custom port numbers and protocol specifications (HTTP/HTTPS).<br>• Persists until changed or the engine session is terminated. | `url` (str) | `None` |
```python
import narrative_ai as nai
nai.llm.set_service_url("https://my-proxy.com/v1")
```

### `get_engine()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Provides access to the underlying low-level `LLMEngine` implementation.<br>• Useful for developers needing to access internal state or raw driver methods.<br>• Returns the active engine singleton for the current environment.<br>• Bypasses high-level SDK abstractions for advanced configuration needs. | `None` | `LLMEngine` |
```python
import narrative_ai as nai
engine = nai.llm.get_engine()
```

### `LLMClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Creates an isolated, stateful client for multi-user or multi-tenant scenarios.<br>• Tracks independent conversation history and session-specific configurations.<br>• Prevents global configuration leakage between different application contexts.<br>• Ideal for backend services serving multiple distinct API consumers. | `user_id` (str), `tenant_id` (str) | `LLMClient` |
```python
import narrative_ai as nai
client = nai.llm.LLMClient(user_id="user_123")
```

---

## 🎙️ STT Engine (`nai.stt`)

### `transcribe()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Converts local audio files into highly accurate text transcripts.<br>• Supports various file formats (MP3, WAV, AAC, OGG) and sample rates.<br>• Optimized for large file processing with automatic segmentation and cleanup.<br>• Returns detailed `STTResult` including confidence scores and word-level timestamps. | `audio_path` (str), `language` (str) | `STTResult` |
```python
import narrative_ai as nai
import asyncio
stt = nai.stt

async def main():
    stt.set_api_key("key", provider="elevenlabs")
    res = await stt.transcribe(audio_path="file.mp3")
    print(res.text)

asyncio.run(main())
```

### `stream_transcribe()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Enables live, low-latency audio transcription from an asynchronous byte stream.<br>• Processes audio chunks incrementally to provide real-time textual feedback.<br>• Perfect for voice-controlled applications and live captioning systems.<br>• Manages buffer sizing and network backpressure automatically for stability. | `audio_stream` | `AsyncIterator` |
```python
import narrative_ai as nai
import asyncio
stt = nai.stt

async def main():
    async for result in stt.stream_transcribe(stream):
        print(result.text)

asyncio.run(main())
```

### `set_api_key()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures STT-specific authentication credentials for providers like ElevenLabs.<br>• Allows for dynamic credential updates without stopping active processing.<br>• Validates provider availability within the current environment setup.<br>• Securely injects headers into the internal HTTP client session. | `key` (str), `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.stt.set_api_key("key", provider="elevenlabs")
```

### `set_stt_provider()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Switches the active STT engine globally across the framework.<br>• Supports switching between cloud-based and local (Whisper) models.<br>• Automatically reconfigures the input processor to match the new engine's requirements.<br>• Validates model compatibility for the requested language and quality level. | `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.stt.set_stt_provider("whisper")
```

### `get_engine()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Accesses the raw `STTEngine` object for low-level audio manipulation.<br>• Allows developers to adjust underlying VAD (Voice Activity Detection) settings.<br>• Useful for debugging audio ingestion and model-specific parameters.<br>• Provides direct access to provider-specific client libraries if needed. | `None` | `STTEngine` |
```python
import narrative_ai as nai
engine = nai.stt.get_engine()
```

### `STTClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Generates a stateful STT client instance for isolated session management.<br>• Maintains independent audio buffers and transcription states for different users.<br>• Prevents cross-contamination of audio data in multi-threaded environments.<br>• Supports user-level configuration for language and model preferences. | `user_id` (str) | `STTClient` |
```python
import narrative_ai as nai
client = nai.stt.STTClient()
```

---

## 🔊 TTS Engine (`nai.tts`)

### `synthesize()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Transforms raw text into high-fidelity, natural-sounding human speech.<br>• Automatically saves the resulting audio to a temporary or specified local path.<br>• Supports a wide range of expressive voice profiles and emotion settings.<br>• Returns the absolute file path for immediate playback or file system management. | `text` (str), `voice` (str) | `str` (Path) |
```python
import narrative_ai as nai
import asyncio
tts = nai.tts

async def main():
    tts.set_api_key("key", provider="openai")
    path = await tts.synthesize(text="Hello")
    print(path)

asyncio.run(main())
```

### `stream_synthesize()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Generates an asynchronous byte stream of synthesized audio data.<br>• Allows for "play-as-you-synthesize" capabilities to minimize user wait times.<br>• Optimized for large text blocks by streaming chunks as they are generated.<br>• Compatible with real-time audio playback libraries and web-socket transmission. | `text` (str), `voice` (str) | `AsyncIterator` |
```python
import narrative_ai as nai
import asyncio
tts = nai.tts

async def main():
    async for chunk in tts.stream_synthesize(text="Hi"):
        print(len(chunk))

asyncio.run(main())
```

### `set_api_key()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures authentication for TTS providers such as OpenAI or ElevenLabs.<br>• Dynamically updates provider credentials for the current active engine.<br>• Verifies that the provider is supported by the installed optional dependencies.<br>• Ensures secure transmission of API keys during synthesis requests. | `key` (str), `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.tts.set_api_key("key", provider="openai")
```

### `set_tts_provider()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Globally changes the Text-to-Speech engine for the framework session.<br>• Enables switching between different quality and cost tiers (e.g., HD vs Standard).<br>• Updates internal voice maps to reflect the available voices of the new provider.<br>• Ensures consistent output formats across different synthesis engines. | `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.tts.set_tts_provider("elevenlabs")
```

### `get_engine()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Provides access to the underlying `TTSEngine` instance for direct control.<br>• Allows for fine-tuning of audio sample rates, bit rates, and output formats.<br>• Useful for advanced developers needing to bypass the high-level synthesis API.<br>• Returns the singleton instance currently managing TTS operations. | `None` | `TTSEngine` |
```python
import narrative_ai as nai
engine = nai.tts.get_engine()
```

### `TTSClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Creates an isolated session client for specific Text-to-Speech tasks.<br>• Maintains independent voice settings and synthesis history per client instance.<br>• Prevents global configuration changes from affecting specific synthesis workflows.<br>• Ideal for applications requiring simultaneous synthesis with different voices. | `user_id` (str) | `TTSClient` |
```python
import narrative_ai as nai
client = nai.tts.TTSClient()
```

---

## 📚 RAG Engine (`nai.rag`)

### `remember()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Indexes a `StructuredDocument` into the semantic vector store for future recall.<br>• Automatically generates high-dimensional embeddings using the configured provider.<br>• Persists document metadata alongside vectors for filtered retrieval operations.<br>• Returns a boolean success indicator after confirming storage in the database. | `document` (Doc), `doc_id` (str) | `bool` |
```python
import narrative_ai as nai
import asyncio
rag = nai.rag

async def main():
    doc = await nai.input_processor.process("f.pdf")
    await rag.remember(document=doc, doc_id="id1")

asyncio.run(main())
```

### `recall()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Performs a semantic similarity search across all stored documents.<br>• Returns a `RichContext` object containing the most relevant text snippets.<br>• Automatically ranks results based on vector distance (Cosine/Euclidean).<br>• Essential for building grounding context for LLM-based RAG applications. | `query` (str), `top_k` (int) | `RichContext` |
```python
import narrative_ai as nai
import asyncio
rag = nai.rag

async def main():
    res = await rag.recall(query="query")
    print(res.formatted_text)

asyncio.run(main())
```

### `forget()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Permanently deletes a specific document and its vectors from memory.<br>• Uses the provided `doc_id` to locate and remove all associated records.<br>• Ensures that outdated or sensitive information is cleared from the vector store.<br>• Returns success status once the record is confirmed as deleted. | `doc_id` (str) | `bool` |
```python
import narrative_ai as nai
import asyncio
rag = nai.rag

async def main():
    await rag.forget(doc_id="id1")

asyncio.run(main())
```

### `clear_memory()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Completely wipes the entire vector database and associated metadata.<br>• Critical for resetting agent memory or clearing tenant data during cleanup.<br>• Irreversible action that removes all indexed documents in the current store.<br>• Returns success status once the operation is completed and verified. | `None` | `bool` |
```python
import narrative_ai as nai
import asyncio
rag = nai.rag

async def main():
    await rag.clear_memory()

asyncio.run(main())
```

### `set_api_key()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures the authentication key for embedding generation services.<br>• Supports providers such as OpenAI, Cohere, or local HuggingFace models.<br>• Essential for authorizing vectorization requests during `remember` and `recall`.<br>• Validates provider availability before setting the global key state. | `key` (str), `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.rag.set_api_key("key", provider="openai")
```

### `get_manager()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Provides access to the `MemoryManager` instance for advanced database control.<br>• Allows developers to perform raw vector queries and database maintenance.<br>• Useful for checking database health, connection status, and record counts.<br>• Returns the active manager singleton used by the RAG engine. | `None` | `MemoryManager` |
```python
import narrative_ai as nai
mgr = nai.rag.get_manager()
```

### `RAGClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Creates an isolated, stateful RAG client for multi-user knowledge isolation.<br>• Maintains separate vector collections or namespaces per client instance.<br>• Prevents data leakage between different users in the same application.<br>• Supports client-specific embedding and retrieval configurations. | `user_id` (str) | `RAGClient` |
```python
import narrative_ai as nai
client = nai.rag.RAGClient()
```

---

## 👁️ OCR Engine (`nai.ocr`)

### `process_image()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Extracts printed or handwritten text from local image files (JPG, PNG).<br>• Uses computer vision models to detect text blocks and preserve reading order.<br>• Returns a structured `OCRResult` containing raw text and confidence data.<br>• Automatically handles image pre-processing (denoising, grayscale) for better accuracy. | `image_path` (str) | `OCRResult` |
```python
import narrative_ai as nai
import asyncio
ocr = nai.ocr

async def main():
    res = await ocr.process_image(image_path="i.jpg")
    print(res.text)

asyncio.run(main())
```

### `process_pdf()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Performs high-quality text extraction from PDF documents.<br>• Handles both searchable PDFs and scanned image-based PDF files.<br>• Returns structured text content while attempting to maintain document layout.<br>• Optimized for large, multi-page document processing with progress tracking. | `pdf_path` (str) | `OCRResult` |
```python
import narrative_ai as nai
import asyncio
ocr = nai.ocr

async def main():
    res = await ocr.process_pdf(pdf_path="d.pdf")
    print(res.text)

asyncio.run(main())
```

### `set_service_url()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures a custom endpoint for OCR processing services.<br>• Essential for using self-hosted OCR engines or private enterprise APIs.<br>• Updates the internal HTTP client to route all OCR requests to the new URL.<br>• Persists across the current engine session until modified. | `url` (str) | `None` |
```python
import narrative_ai as nai
nai.ocr.set_service_url("https://my-ocr.com")
```

### `set_ocr_provider()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Switches the active OCR engine provider globally (e.g., Tesseract to Google Vision).<br>• Automatically adjusts internal processing logic to match the new engine's API.<br>• Validates that required system dependencies are installed for the new provider.<br>• Ensures consistent output formats across different vision models. | `provider` (str) | `None` |
```python
import narrative_ai as nai
nai.ocr.set_ocr_provider("google_vision")
```

### `get_pipeline()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Accesses the raw `OCRPipeline` for custom image transformation control.<br>• Allows developers to insert custom pre-processing or post-processing steps.<br>• Useful for debugging complex OCR failures or adjusting model thresholds.<br>• Provides direct access to the active pipeline object for advanced usage. | `None` | `OCRPipeline` |
```python
import narrative_ai as nai
pipeline = nai.ocr.get_pipeline()
```

### `OCRClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Generates a stateful OCR client instance for isolated processing tasks.<br>• Maintains independent configurations and processing histories per instance.<br>• Prevents global settings from affecting specific document extraction jobs.<br>• Ideal for applications running concurrent OCR tasks with different requirements. | `user_id` (str) | `OCRClient` |
```python
import narrative_ai as nai
client = nai.ocr.OCRClient()
```

---

## 🛠️ Input Processor (`nai.input_processor`)

### `process()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • The primary intelligent gateway for all multimodal data ingestion.<br>• Auto-detects input types (Audio, PDF, Image, URL) using file signatures.<br>• Orchestrates internal routing to STT, OCR, or Web engines based on type.<br>• Returns a unified `StructuredDocument` for consistent downstream usage. | `source` (Any) | `StructuredDocument` |
```python
import narrative_ai as nai
import asyncio
ip = nai.input_processor

async def main():
    doc = await ip.process(source="file.mp3")
    print(doc.text)

asyncio.run(main())
```

### `process_batch()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Facilitates high-performance, concurrent processing of multiple file sources.<br>• Automatically manages thread/process pools to maximize ingestion speed.<br>• Returns a list of `StructuredDocument` objects corresponding to input order.<br>• Optimized for large-scale data ingestion and initial repository indexing. | `sources` (List) | `List[Doc]` |
```python
import narrative_ai as nai
import asyncio
ip = nai.input_processor

async def main():
    docs = await ip.process_batch(sources=["f1.jpg", "f2.pdf"])

asyncio.run(main())
```

### `process_audio()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Specifically routes audio files directly to the STT engine for transcription.<br>• Bypasses type-detection logic for faster processing when format is known.<br>• Validates audio file integrity before attempting transcription.<br>• Returns a document containing the transcribed text and audio metadata. | `path` (str) | `Doc` |
```python
import narrative_ai as nai
import asyncio
ip = nai.input_processor

async def main():
    doc = await ip.process_audio(path="a.wav")

asyncio.run(main())
```

### `process_document()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Explicitly handles PDF or Office document files using the OCR engine.<br>• Ensures that document-specific layout logic is applied during extraction.<br>• Bypasses auto-detection for predictable routing in document-only pipelines.<br>• Returns a document with extracted text and original structure preservation. | `path` (str) | `Doc` |
```python
import narrative_ai as nai
import asyncio
ip = nai.input_processor

async def main():
    doc = await ip.process_document(path="d.pdf")

asyncio.run(main())
```

### `process_image()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Directly routes image files to the OCR or Vision engines for analysis.<br>• Optimized for photo-based text extraction and visual data ingestion.<br>• Validates image format and resolution before starting the extraction job.<br>• Returns a document containing text findings and image metadata. | `path` (str) | `Doc` |
```python
import narrative_ai as nai
import asyncio
ip = nai.input_processor

async def main():
    doc = await ip.process_image(path="i.jpg")

asyncio.run(main())
```

### `process_url()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Scrapes and processes content from a public web URL or direct link.<br>• Automatically strips HTML boilerplate (ads, navbars) to extract core text.<br>• Integrates with the Web Intel engine for deep scraping and analysis.<br>• Returns a document containing cleaned web content and source URL. | `url` (str) | `Doc` |
```python
import narrative_ai as nai
import asyncio
ip = nai.input_processor

async def main():
    doc = await ip.process_url(url="https://...")

asyncio.run(main())
```

### `InputClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Generates a stateful client for managing isolated data ingestion workflows.<br>• Maintains independent processing logs and engine configurations per user.<br>• Critical for server-side applications handling multiple concurrent file uploads.<br>• Supports user-level overrides for routing and engine preferences. | `user_id` (str) | `InputClient` |
```python
import narrative_ai as nai
client = nai.input_processor.InputClient()
```

---

## 🤖 Voice Mode (`nai.voice_mode`)

### `start_agent()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Launches the high-performance conversational AI worker loop.<br>• Orchestrates the full interaction cycle: VAD -> STT -> LLM -> TTS.<br>• Connects the worker to the configured LiveKit room for real-time interaction.<br>• Manages agent memory and system prompt injection during the session. | `None` | `None` |
```python
import narrative_ai as nai
voice = nai.voice_mode
voice.start_agent()
```

### `set_livekit_config()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures essential connection details for the LiveKit signaling server.<br>• Securely stores URL, API Key, and Secret for authenticated worker access.<br>• Validates connection parameters before attempting to launch the agent.<br>• Essential for cloud-based or local deployments of real-time voice agents. | `url` (str), `api_key` (str), `api_secret` (str) | `None` |
```python
import narrative_ai as nai
nai.voice_mode.set_livekit_config(url="...", api_key="...", api_secret="...")
```

### `set_agent_name()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Sets the displayed name and internal ID for the conversational agent.<br>• Used for identity management within the LiveKit UI and metadata streams.<br>• Allows for customizing the agent's persona in multi-agent environments.<br>• Persists until explicitly changed or the framework session ends. | `name` (str) | `None` |
```python
import narrative_ai as nai
nai.voice_mode.set_agent_name("Jarvis")
```

### `VoiceClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Creates an isolated session client for managing specific voice agent instances.<br>• Allows for running multiple distinct agents with different personas simultaneously.<br>• Maintains independent session logs and LiveKit connection configurations.<br>• Supports per-user customization of voice models and STT sensitivity. | `user_id` (str) | `VoiceClient` |
```python
import narrative_ai as nai
client = nai.voice_mode.VoiceClient()
```

---

## 🔍 Web Intelligence (`nai.web_intel`)

### `search()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Performs a live, real-time web search to retrieve the latest global information.<br>• Automatically filters results for quality and relevance to the provided query.<br>• Returns a `WebResult` containing titles, snippets, and source URLs for verification.<br>• Essential for grounding AI agents in current events and real-time data. | `query` (str) | `WebResult` |
```python
import narrative_ai as nai
import asyncio
web = nai.web_intel

async def main():
    web.set_api_key("key")
    res = await web.search(query="AI News")

asyncio.run(main())
```

### `research()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Conducts deep, automated research on a complex topic across multiple sources.<br>• Synthesizes findings into a coherent, cited markdown report for the user.<br>• Automatically generates follow-up queries to explore sub-topics in depth.<br>• Returns a comprehensive summary that acts as a ready-to-use research document. | `topic` (str) | `str` |
```python
import narrative_ai as nai
import asyncio
web = nai.web_intel

async def main():
    report = await web.research(topic="Topic")

asyncio.run(main())
```

### `set_api_key()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures the authentication key for web search and scraping providers.<br>• Supports integration with services like Tavily, DuckDuckGo, or custom proxies.<br>• Ensures that all outgoing web requests are correctly authorized.<br>• Persists global credentials for the entire active search session. | `api_key` (str) | `None` |
```python
import narrative_ai as nai
nai.web_intel.set_api_key("key")
```

### `get_engine()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Provides access to the underlying `WebIntelEngine` for raw scraping control.<br>• Allows developers to adjust search depth, result counts, and scraper settings.<br>• Useful for advanced research tasks that require bypassing high-level API limits.<br>• Returns the active search singleton instance for the current environment. | `None` | `WebIntelEngine` |
```python
import narrative_ai as nai
engine = nai.web_intel.get_engine()
```

### `WebIntelClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Generates a stateful client for managing isolated web search and research tasks.<br>• Maintains independent search histories and per-user result filters.<br>• Critical for multi-user platforms requiring privacy and isolated research contexts.<br>• Supports client-level configuration for search depth and source white-listing. | `user_id` (str) | `WebIntelClient` |
```python
import narrative_ai as nai
client = nai.web_intel.WebIntelClient()
```

---

## 🎨 VLM Engine (`nai.vlm`)

### `analyze_image()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Performs complex visual reasoning and description using Vision-Language Models.<br>• Can answer specific questions about image content or provide holistic summaries.<br>• Supports multimodal models like GPT-4V, Gemini Vision, and local Qwen-VL models.<br>• Returns a `VLMResponse` containing the text findings and model metadata. | `image` (Any), `prompt` (str) | `VLMResponse` |
```python
import narrative_ai as nai
import asyncio
vlm = nai.vlm

async def main():
    vlm.set_api_key("key")
    res = await vlm.analyze_image(image="i.jpg", prompt="Describe")

asyncio.run(main())
```

### `chat_with_image()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Enables a multi-turn conversational interface centered around a visual source.<br>• Maintains context of previous messages to allow follow-up questions about the image.<br>• Optimized for interactive visual discovery and debugging tasks.<br>• Automatically manages image re-injection into the conversation history. | `image` (Any), `history` (List) | `VLMResponse` |
```python
import narrative_ai as nai
import asyncio
vlm = nai.vlm

async def main():
    res = await vlm.chat_with_image(image="i.jpg", history=[])

asyncio.run(main())
```

### `set_api_key()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Configures the global authentication key for multimodal vision providers.<br>• Supports dynamic switching between different vision API providers at runtime.<br>• Validates that the provider supports visual reasoning for the current key tier.<br>• Securely stores the key for use in all subsequent VLM requests. | `api_key` (str) | `None` |
```python
import narrative_ai as nai
nai.vlm.set_api_key("key")
```

### `get_processor()`
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Accesses the raw `VLMProcessor` instance for low-level image transformation control.<br>• Allows developers to adjust image resizing, encoding, and patching parameters.<br>• Useful for optimizing vision model performance on high-resolution images.<br>• Provides direct access to the underlying multimodal processing pipeline. | `None` | `VLMProcessor` |
```python
import narrative_ai as nai
proc = nai.vlm.get_processor()
```

### `VLMClient` (Class)
| Detailed Description (Main Points) | Inputs | Outputs |
| :--- | :--- | :--- |
| • Creates an isolated session client for managing specific vision reasoning tasks.<br>• Maintains independent image chat histories and client-specific model settings.<br>• Prevents global configuration changes from affecting specific VLM workflows.<br>• Ideal for applications handling concurrent image analysis from multiple users. | `user_id` (str) | `VLMClient` |
```python
import narrative_ai as nai
client = nai.vlm.VLMClient()
```

---

## License
MIT License.
