Metadata-Version: 2.4
Name: agentkit-sdk
Version: 0.3.0
Summary: Developer SDK for building personalized voice AI assistants
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: fastapi>=0.115.0
Requires-Dist: uvicorn[standard]>=0.32.0
Requires-Dist: websockets>=14.0
Requires-Dist: pydantic>=2.10.0
Requires-Dist: pydantic-settings>=2.6.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: click>=8.1.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: google-genai>=1.0.0
Requires-Dist: openai>=1.50.0
Requires-Dist: qdrant-client>=1.12.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: tiktoken>=0.7.0
Requires-Dist: aiofiles>=24.0.0
Requires-Dist: tenacity>=9.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.24.0; extra == "dev"
Requires-Dist: ruff>=0.8.0; extra == "dev"
Requires-Dist: mypy>=1.13.0; extra == "dev"

# AgentKit

A developer SDK for building personalized voice AI assistants with memory, learning, and mobile APK generation.

**AgentKit is to personal AI assistants what Firebase is to apps** — a complete backend + mobile shell that you configure once and deploy in under 30 minutes.

## Features

- **Voice Pipeline** — Streaming STT → LLM → TTS with <500ms first-audio latency
- **Memory System** — Markdown (simple) or Qdrant vector (semantic search)
- **Learning Engine** — Detects corrections, learns from mistakes, makes proactive recommendations
- **Multi-provider** — Sarvam/Deepgram (STT), Gemini/OpenAI (LLM), Sarvam/ElevenLabs (TTS)
- **Mobile Shell** — React Native app with VoiceOrb interface, builds to Android APK
- **CLI** — `init`, `serve`, `build`, `deploy` commands for the full lifecycle

## Quick Start

```bash
# Install
pip install agentkit-sdk

# Create a new agent project (interactive)
agentkit init my-agent

# Enter the project
cd my-agent

# Add your API keys
# Edit .env and fill in the required keys

# Start the server
agentkit serve
```

The server starts at `http://localhost:8000` with:
- **Playground**: `http://localhost:8000/playground` — browser-based test UI
- **WebSocket**: `ws://localhost:8000/ws/voice` — real-time voice/text endpoint
- **Health**: `http://localhost:8000/health` — server status check
- **REST**: `POST /api/chat` — text chat endpoint

## CLI Commands

| Command | Description |
|---------|-------------|
| `agentkit init <name>` | Interactive project setup — picks providers, generates config |
| `agentkit serve` | Start FastAPI server with playground |
| `agentkit serve --validate-only` | Check config without starting |
| `agentkit build android` | Build Android APK from AgentShell template |
| `agentkit deploy --platform railway` | Deploy to Railway |
| `agentkit deploy --platform render` | Deploy to Render |
| `agentkit deploy --platform docker` | Generate Dockerfile |

## Configuration

`agent.config.yaml` — generated by `agentkit init`:

```yaml
agent:
  name: my-agent
  persona: "You are a helpful personal assistant..."
  language: hinglish  # english / hindi / hinglish

voice:
  enabled: true
  stt:
    provider: sarvam    # sarvam / deepgram
    api_key: ${SARVAM_API_KEY}
  tts:
    provider: sarvam    # sarvam / elevenlabs
    voice: meera
    api_key: ${SARVAM_API_KEY}

llm:
  provider: gemini      # gemini / openai
  model: gemini-2.0-flash
  api_key: ${GEMINI_API_KEY}
  temperature: 0.7

memory:
  type: markdown        # markdown / vector
  backend: local        # local / qdrant
  episodic_window: 20
  semantic_top_k: 5

learning:
  enabled: true
  correction_detection: true
  implicit_feedback: true
  profile_extraction: true

deployment:
  type: self-host
  port: 8000
```

API keys are referenced as `${VAR_NAME}` and resolved from your `.env` file at startup.

## Custom Providers

Every provider slot (STT, LLM, TTS, Memory) is pluggable. Use a built-in name or a dotted import path to your own class:

```yaml
# Built-in provider
llm:
  provider: gemini

# Custom provider — any class that extends BaseLLM
llm:
  provider: my_package.llm.OllamaLLM
  api_key: ${OLLAMA_API_KEY}
  model: llama3
  base_url: http://localhost:11434
```

Your custom class must extend the appropriate base class (`BaseSTT`, `BaseLLM`, `BaseTTS`, or `BaseMemory`). All config keys under the provider section are passed as constructor kwargs automatically.

**Writing a custom LLM provider:**

```python
# my_package/llm.py
from agentkit.providers.llm.base import BaseLLM, Message

class OllamaLLM(BaseLLM):
    def __init__(self, api_key: str, model: str = "llama3", base_url: str = "http://localhost:11434", **kwargs):
        self.model = model
        self.base_url = base_url

    async def chat_stream(self, messages, system, memory_context=""):
        # Your streaming implementation
        ...

    async def chat(self, messages, system, memory_context=""):
        # Your non-streaming implementation
        ...

    async def close(self):
        pass
```

**Registering at runtime** (alternative to dotted paths):

```python
from agentkit.providers import registry
from my_package.llm import OllamaLLM

registry.register("llm", "ollama", OllamaLLM)
# Now you can use provider: ollama in config
```

**During `agentkit init`**, select "custom" when prompted for a provider to enter your class path interactively.

| Category | Base Class | Built-in Providers |
|----------|-----------|-------------------|
| STT | `BaseSTT` | `sarvam`, `deepgram` |
| LLM | `BaseLLM` | `gemini`, `openai` |
| TTS | `BaseTTS` | `sarvam`, `elevenlabs` |
| Memory | `BaseMemory` | `markdown`, `vector` |

## API Keys

Add these to your `.env` file based on your chosen providers:

| Provider | Variable | Get it at |
|----------|----------|-----------|
| Sarvam AI | `SARVAM_API_KEY` | [sarvam.ai](https://www.sarvam.ai) |
| Gemini | `GEMINI_API_KEY` | [aistudio.google.com](https://aistudio.google.com) |
| OpenAI | `OPENAI_API_KEY` | [platform.openai.com](https://platform.openai.com) |
| Deepgram | `DEEPGRAM_API_KEY` | [deepgram.com](https://deepgram.com) |
| ElevenLabs | `ELEVENLABS_API_KEY` | [elevenlabs.io](https://elevenlabs.io) |

## WebSocket Protocol

Connect to `ws://localhost:8000/ws/voice` and exchange JSON messages:

**Send text:**
```json
{"type": "text", "text": "Hello, what's the weather?"}
```

**Send audio:**
```json
{"type": "audio", "data": [/* byte array */]}
```

**Receive responses:**
```json
{"type": "audio", "data": "base64-encoded-audio"}
{"type": "text", "text": "The assistant's text response"}
{"type": "done"}
```

## Architecture

```
agentkit init → agent.config.yaml + .env
                        ↓
agentkit serve → FastAPI server
                  ├── /ws/voice (WebSocket)
                  ├── /api/chat (REST)
                  ├── /playground (browser UI)
                  └── /health
                        ↓
              STT → LLM → TTS (streaming pipeline)
                ↕         ↕
            Memory     Learning
          (md/vector)  (corrections)
```

## Development

```bash
# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/
```

## License

MIT
