Metadata-Version: 2.4
Name: medha-archai
Version: 0.2.0
Summary: Semantic Memory for AI Text-to-Query systems
Project-URL: Homepage, https://github.com/ArchAI-Labs/medha
Project-URL: Repository, https://github.com/ArchAI-Labs/medha
Project-URL: Issues, https://github.com/ArchAI-Labs/medha/issues
Author-email: Nicola Procopio <nicola.procopio@acsoftware.it>
License: Apache-2.0
License-File: LICENSE
Keywords: llm,semantic-cache,text-to-cypher,text-to-sql,vector-search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: pydantic-settings<3,>=2.0
Requires-Dist: pydantic<3,>=2.0
Requires-Dist: qdrant-client<2,>=1.9
Provides-Extra: all
Requires-Dist: asyncpg<1,>=0.29; extra == 'all'
Requires-Dist: fastembed<1,>=0.3; extra == 'all'
Requires-Dist: gliner<1,>=0.2; extra == 'all'
Requires-Dist: openai<2,>=1.0; extra == 'all'
Requires-Dist: pgvector<1,>=0.3; extra == 'all'
Requires-Dist: rapidfuzz<4,>=3.0; extra == 'all'
Requires-Dist: redis[asyncio]<6,>=5.0; extra == 'all'
Requires-Dist: spacy<4,>=3.7; extra == 'all'
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pre-commit>=3.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: fastembed
Requires-Dist: fastembed<1,>=0.3; extra == 'fastembed'
Provides-Extra: fuzzy
Requires-Dist: rapidfuzz<4,>=3.0; extra == 'fuzzy'
Provides-Extra: gliner
Requires-Dist: gliner<1,>=0.2; extra == 'gliner'
Provides-Extra: nlp
Requires-Dist: spacy<4,>=3.7; extra == 'nlp'
Provides-Extra: openai
Requires-Dist: openai<2,>=1.0; extra == 'openai'
Provides-Extra: pgvector
Requires-Dist: asyncpg<1,>=0.29; extra == 'pgvector'
Requires-Dist: pgvector<1,>=0.3; extra == 'pgvector'
Provides-Extra: redis
Requires-Dist: redis[asyncio]<6,>=5.0; extra == 'redis'
Description-Content-Type: text/markdown

# Medha

[![PyPI Downloads](https://static.pepy.tech/personalized-badge/medha-archai?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=BLUE&left_text=downloads)](https://pepy.tech/projects/medha-archai)

<br>

![medha_logo](https://raw.githubusercontent.com/ArchAI-Labs/medha/refs/heads/main/img/medha_logo.png)

<br>

## Semantic Memory for AI Data Agents

> ***Reduce LLM latency and costs by caching Text-to-Query generations (SQL, Cypher, GraphQL) with semantic understanding.***

---

## What is Medha?

**Medha** is an asynchronous, high-performance semantic cache library designed specifically for **Text-to-Query** systems.

Unlike traditional key-value caches that require exact string matches, Medha understands that *"Show me the top 5 users"* and *"List the first five users"* are the same question. It intercepts these queries and returns pre-calculated database queries (SQL, Cypher, etc.), bypassing the expensive and slow LLM generation step.

### Why Medha?
* **100x Faster:** Return cached queries in milliseconds vs. seconds for LLM generation.
* **Cost Efficient:** Reduce API calls to OpenAI/Anthropic by 40-60%.
* **Agnostic:** Works with **SQL**, **Cypher** (Neo4j), **GraphQL**, or any text-based query language.
* **Async Native:** Built on `asyncio` for high-concurrency API backends.
* **Pluggable:** Swap embedders (FastEmbed, OpenAI) and vector backends independently.

---

## The "Waterfall" Architecture

Medha uses a sophisticated multi-tier search strategy to maximize cache hits. If a tier fails, it cascades to the next:

1.  **Tier 0: L1 Memory (LRU)**
    * *Speed:* < 1ms
    * Exact hash match for identical, repeated questions.
2.  **Tier 1: Template Matching (Intent)**
    * *Speed:* ~10ms
    * Recognizes patterns like *"Show employees in {department}"*. Extracts parameters and injects them into a cached query template.
3.  **Tier 2 + 3: Exact Vector Match & Semantic Similarity** *(run in parallel)*
    * *Speed:* ~25ms (concurrent, not sequential)
    * Exact match uses a high threshold (≥ 0.99); Semantic uses a lower one (≥ 0.90). Both vector queries are fired simultaneously via `asyncio.gather` and the best result is chosen.
5.  **Tier 4: Fuzzy Fallback**
    * *Speed:* Variable
    * Handles typos and minor string variations using Levenshtein distance.

---

## Installation

### Core (minimal)

```bash
pip install medha-archai
```

Core dependencies: `pydantic`, `pydantic-settings`, `qdrant-client`.

### With an embedding provider

```bash
# Local embeddings with FastEmbed (recommended for getting started)
pip install "medha-archai[fastembed]"

# OpenAI embeddings
pip install "medha-archai[openai]"
```

### With optional extras

```bash
# Fuzzy matching (Tier 4 - Levenshtein distance)
pip install "medha-archai[fuzzy]"

# spaCy NLP for parameter extraction (pre-trained, fixed entity types, ~15 MB model)
pip install "medha-archai[nlp]"
python -m spacy download en_core_web_sm

# GLiNER NLP for zero-shot parameter extraction (uses param names as labels, ~500 MB model)
pip install "medha-archai[gliner]"

# Distributed L1 cache (Redis — for multi-instance deployments)
pip install "medha-archai[redis]"

# With pgvector backend
pip install "medha-archai[pgvector]"

# All optional dependencies
pip install "medha-archai[all]"
```

### Install from source

```bash
# From GitHub
!pip install "medha-archai[all] @ git+https://github.com/ArchAI-Labs/medha.git"

# Development install
git clone https://github.com/ArchAI-Labs/medha.git
cd medha
pip install -e ".[dev,all]"
```

---

## Quick Start

```python
import asyncio
from medha import Medha
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

async def main():
    embedder = FastEmbedAdapter()
    cache = Medha(collection_name="text2sql_cache", embedder=embedder)

    async with cache:
        question = "How many users are active?"

        # 1. Search the cache
        hit = await cache.search(question)

        if hit.strategy.value != "no_match":
            print(f"Cache Hit! Strategy: {hit.strategy.value}")
            print(f"Query: {hit.generated_query}")
            print(f"Confidence: {hit.confidence:.2f}")
        else:
            print("Cache Miss. Calling LLM...")
            generated_sql = "SELECT count(*) FROM users WHERE status = 'active';"

            # 2. Store the result for next time
            await cache.store(
                question=question,
                generated_query=generated_sql,
            )
            print("Stored in cache.")

if __name__ == "__main__":
    asyncio.run(main())
```

---

## Choosing a Backend

| Backend | Install | Persistence | Best For |
|---------|---------|-------------|----------|
| `qdrant` (default) | `pip install medha-archai` | Yes (Docker/Cloud) | Production, large datasets |
| `memory` | `pip install medha-archai` | No | Testing, development, CI |
| `pgvector` | `pip install medha-archai[pgvector]` | Yes (PostgreSQL) | Teams already using PostgreSQL |

### InMemory Backend (zero dependencies)

```python
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

embedder = FastEmbedAdapter()
settings = Settings(backend_type="memory")

async with Medha(collection_name="my_cache", embedder=embedder, settings=settings) as m:
    await m.store("How many users?", "SELECT COUNT(*) FROM users")
    hit = await m.search("Count of users")
    print(hit.generated_query)
```

### PostgreSQL + pgvector Backend

```python
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

embedder = FastEmbedAdapter()
settings = Settings(
    backend_type="pgvector",
    pg_dsn="postgresql://user:password@localhost:5432/mydb",
)

async with Medha(collection_name="my_cache", embedder=embedder, settings=settings) as m:
    await m.store("How many users?", "SELECT COUNT(*) FROM users")
    hit = await m.search("Count of users")
    print(hit.generated_query)
```

Or via environment variables:

```bash
export MEDHA_BACKEND_TYPE=pgvector
export MEDHA_PG_DSN=postgresql://user:password@localhost:5432/mydb
```

---

## Configuration Examples

Medha is highly configurable. Below are examples covering every major use case.

### Basic: Zero-Dependency In-Memory Setup

The simplest setup, perfect for development, testing, and CI. No external services needed.

```python
import asyncio
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

async def main():
    # backend_type="memory" — pure-Python backend, zero external dependencies
    settings = Settings(backend_type="memory")
    embedder = FastEmbedAdapter()

    async with Medha(
        collection_name="dev_cache",
        embedder=embedder,
        settings=settings,
    ) as cache:
        await cache.store("List all users", "SELECT * FROM users;")
        hit = await cache.search("Show me all the users")
        print(hit.generated_query)  # SELECT * FROM users;

asyncio.run(main())
```

### Qdrant Docker (Local Persistence)

For persistent caching across restarts using a local Qdrant instance.

```bash
# Start Qdrant first
docker run -p 6333:6333 qdrant/qdrant
```

```python
import asyncio
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

async def main():
    settings = Settings(
        backend_type="qdrant",
        qdrant_mode="docker",
        qdrant_host="localhost",
        qdrant_port=6333,
    )
    embedder = FastEmbedAdapter()

    async with Medha(
        collection_name="persistent_cache",
        embedder=embedder,
        settings=settings,
    ) as cache:
        await cache.store(
            "Total revenue last quarter",
            "SELECT SUM(amount) FROM orders WHERE date >= '2024-10-01';",
        )
        hit = await cache.search("What was last quarter's revenue?")
        print(f"{hit.strategy.value}: {hit.generated_query}")

asyncio.run(main())
```

### Qdrant Cloud (Production)

For production deployments using Qdrant Cloud with API key authentication.

```python
import asyncio
from medha import Medha, Settings
from medha.embeddings.openai_adapter import OpenAIAdapter

async def main():
    settings = Settings(
        backend_type="qdrant",
        qdrant_mode="cloud",
        qdrant_url="https://your-cluster.cloud.qdrant.io",
        qdrant_api_key="your-qdrant-api-key",  # stored as SecretStr, never logged
    )
    embedder = OpenAIAdapter(
        model_name="text-embedding-3-small",
        api_key="sk-your-openai-key",
    )

    async with Medha(
        collection_name="production_cache",
        embedder=embedder,
        settings=settings,
    ) as cache:
        await cache.store(
            "Get all pending orders",
            "SELECT * FROM orders WHERE status = 'pending';",
        )
        hit = await cache.search("Show pending orders")
        print(f"Confidence: {hit.confidence:.2f}")

asyncio.run(main())
```

### Environment Variable Configuration

All settings can be configured via environment variables with the `MEDHA_` prefix. No code changes needed.

```bash
# .env or shell exports
export MEDHA_QDRANT_MODE=docker
export MEDHA_QDRANT_HOST=qdrant.internal.company.com
export MEDHA_QDRANT_PORT=6333
export MEDHA_SCORE_THRESHOLD_SEMANTIC=0.85
export MEDHA_SCORE_THRESHOLD_EXACT=0.98
export MEDHA_L1_CACHE_MAX_SIZE=5000
export MEDHA_QUERY_LANGUAGE=sql
export MEDHA_ENABLE_QUANTIZATION=true
export MEDHA_ON_DISK=false
export MEDHA_TEMPLATE_FILE=/etc/medha/templates.json
```

```python
import asyncio
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

async def main():
    # Settings automatically loads from MEDHA_* environment variables
    settings = Settings()
    embedder = FastEmbedAdapter()

    async with Medha(
        collection_name="my_cache",
        embedder=embedder,
        settings=settings,
    ) as cache:
        hit = await cache.search("Show me all employees")
        print(hit.strategy.value)

asyncio.run(main())
```

---

## Embedding Providers

### FastEmbed (Local, No API Key)

Runs entirely locally using ONNX Runtime. No API key, no network calls, no costs.

```python
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

# Default model (384 dimensions, fast and lightweight)
embedder = FastEmbedAdapter()

# Higher quality model
embedder = FastEmbedAdapter(
    model_name="BAAI/bge-base-en-v1.5",
    max_length=512,
)

# Custom cache directory for model files
embedder = FastEmbedAdapter(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    cache_dir="/opt/models/fastembed",
)
```

### OpenAI Embeddings

Uses OpenAI's embedding API. Requires an API key (via parameter or `OPENAI_API_KEY` env var).

```python
from medha.embeddings.openai_adapter import OpenAIAdapter

# Default: text-embedding-3-small (1536 dimensions)
embedder = OpenAIAdapter(api_key="sk-your-key")

# High-quality large model (3072 dimensions)
embedder = OpenAIAdapter(
    model_name="text-embedding-3-large",
    api_key="sk-your-key",
)

# With custom dimensions (only supported by text-embedding-3-* models)
embedder = OpenAIAdapter(
    model_name="text-embedding-3-small",
    dimensions=512,
    api_key="sk-your-key",
)

# API key from environment variable (OPENAI_API_KEY)
embedder = OpenAIAdapter()
```

### Custom Embedder

Implement the `BaseEmbedder` interface to use any embedding provider.

```python
from medha.interfaces import BaseEmbedder
from typing import List

class MyCustomEmbedder(BaseEmbedder):
    @property
    def dimension(self) -> int:
        return 768

    @property
    def model_name(self) -> str:
        return "my-custom-model"

    async def aembed(self, text: str) -> List[float]:
        # Your embedding logic here
        ...

    async def aembed_batch(self, texts: List[str]) -> List[List[float]]:
        # Your batch embedding logic here
        ...

embedder = MyCustomEmbedder()
```

---

## Search Threshold Tuning

Fine-tune how aggressively Medha matches questions at each tier.

### Strict Matching (High Precision)

Only return cache hits when very confident. Minimizes false positives.

```python
from medha import Settings

settings = Settings(
    score_threshold_exact=0.995,     # Near-identical vectors only
    score_threshold_semantic=0.95,   # Very close meaning only
    score_threshold_template=0.90,   # Template must be a strong match
    score_threshold_fuzzy=95.0,      # Almost no typos allowed
)
```

### Relaxed Matching (High Recall)

Return more cache hits, accepting slightly lower confidence. Reduces LLM calls.

```python
from medha import Settings

settings = Settings(
    score_threshold_exact=0.97,
    score_threshold_semantic=0.82,
    score_threshold_template=0.75,
    score_threshold_fuzzy=75.0,
)
```

### Disable Specific Tiers

```python
from medha import Settings

# Disable L1 in-memory cache (always hit the vector store)
settings = Settings(l1_cache_max_size=0)

# Fuzzy matching is automatically disabled if rapidfuzz is not installed
# To install: pip install "medha-archai[fuzzy]"
```

---

## Cache Warming

Pre-populate the cache from a file before serving traffic. Supports both **JSON array** and **JSONL** formats.

```jsonc
// warm_queries.jsonl  — one entry per line
{"question": "How many users are active?", "generated_query": "SELECT COUNT(*) FROM users WHERE status = 'active';"}
{"question": "Total revenue this month", "generated_query": "SELECT SUM(amount) FROM orders WHERE date >= DATE_TRUNC('month', NOW());", "response_summary": "Monthly revenue total"}
```

```python
import asyncio
from medha import Medha
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

async def main():
    async with Medha(
        collection_name="my_cache",
        embedder=FastEmbedAdapter(),
    ) as cache:
        # Load from JSONL (also accepts JSON array files)
        loaded = await cache.warm_from_file("warm_queries.jsonl")
        print(f"Warmed {loaded} entries")

        # Sync variant
        # loaded = cache.warm_from_file_sync("warm_queries.json")

        print(cache.stats["warm_loaded"])  # 2

asyncio.run(main())
```

**Required keys per entry:** `question`, `generated_query`
**Optional keys:** `response_summary`, `template_id`

Internally calls `store_batch()` — a single embedding round-trip for all entries.

---

## Security Settings

Medha 0.2.0 adds three settings to defend against common attack vectors when
Medha is exposed to untrusted input.

### Input Length Guard — `max_question_length`

Prevent DoS via oversized question strings. `search()` returns
`SearchStrategy.ERROR`; `store()` raises `ValueError`.

```python
settings = Settings(max_question_length=2048)  # default: 8192
```

### File Size Limit — `max_file_size_mb`

`warm_from_file()` and `load_templates_from_file()` reject files larger than
this limit *before* reading them.

```python
settings = Settings(max_file_size_mb=50)  # default: 100 MB
```

### Path Traversal Protection — `allowed_file_dir`

When set, `warm_from_file()` and `load_templates_from_file()` reject any path
that resolves outside the specified directory.

```python
settings = Settings(allowed_file_dir="/app/data")
# warm_from_file("/app/data/../etc/passwd") → ValueError
```

---

## Distributed L1 Cache (Redis)

By default Medha's L1 cache is in-process. With multiple service instances (horizontal scaling) each process has its own isolated cache. Use `RedisL1Cache` to share the L1 cache across instances.

```bash
pip install "medha-archai[redis]"
```

```python
from medha import Medha
from medha.l1_cache.redis_adapter import RedisL1Cache
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

# Shared L1 cache — all instances see the same hits
redis_l1 = RedisL1Cache(
    url="redis://redis.internal:6379/0",
    prefix="myapp:medha:l1",   # namespace to avoid key collisions
    ttl=3600,                   # 1-hour TTL per entry (optional)
)

async with Medha(
    collection_name="prod_cache",
    embedder=FastEmbedAdapter(),
    l1_backend=redis_l1,
) as cache:
    await cache.store("How many users?", "SELECT COUNT(*) FROM users;")
    hit = await cache.search("How many users?")
    print(hit.strategy.value)  # l1_cache (served from Redis)
```

> **Redis eviction:** Configure `maxmemory-policy allkeys-lru` on the Redis server for automatic LRU eviction when memory is full.

### Custom L1 Backend

Implement `L1CacheBackend` to use any fast store (Memcached, DynamoDB DAX, etc.):

```python
from medha.interfaces.l1_cache import L1CacheBackend
from medha.types import CacheHit
from typing import Optional

class MyL1Cache(L1CacheBackend):
    async def get(self, key: str) -> Optional[CacheHit]: ...
    async def set(self, key: str, value: CacheHit) -> None: ...
    async def clear(self) -> None: ...

    @property
    def size(self) -> int: ...

cache = Medha(..., l1_backend=MyL1Cache())
```

---

## Persistent Embedding Cache

By default the embedding cache is in-memory and lost on restart. Set `embedding_cache_path` to persist it across sessions — useful when the same questions recur between deployments.

```bash
export MEDHA_EMBEDDING_CACHE_PATH=/var/cache/medha/embeddings.json
```

```python
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

settings = Settings(
    backend_type="qdrant",
    qdrant_mode="docker",
    embedding_cache_path="/var/cache/medha/embeddings.json",
)

async with Medha(
    collection_name="my_cache",
    embedder=FastEmbedAdapter(),
    settings=settings,
) as cache:
    # On start(): embeddings loaded from disk (if file exists)
    await cache.store("show active users", "SELECT * FROM users WHERE active = true;")
    # On close(): embeddings saved to disk automatically
```

No extra dependencies — uses stdlib `json`.

---

## Template Matching

Templates allow Medha to recognize parameterized patterns and generate queries dynamically without an LLM call.

### Define Templates in Code

```python
import asyncio
from medha import Medha, QueryTemplate
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

templates = [
    QueryTemplate(
        intent="top_n_entities",
        template_text="Show top {count} {entity}",
        query_template="SELECT * FROM {entity} ORDER BY id LIMIT {count}",
        parameters=["count", "entity"],
        priority=1,
        aliases=["List first {count} {entity}", "Get {count} {entity}"],
        parameter_patterns={
            "count": r"\b(\d+)\b",
            "entity": r"\b(users|orders|products|employees)\b",
        },
    ),
    QueryTemplate(
        intent="filter_by_status",
        template_text="Show {entity} with status {status}",
        query_template="SELECT * FROM {entity} WHERE status = '{status}'",
        parameters=["entity", "status"],
        priority=1,
        parameter_patterns={
            "entity": r"\b(users|orders|products)\b",
            "status": r"\b(active|inactive|pending|completed)\b",
        },
    ),
    QueryTemplate(
        intent="count_by_group",
        template_text="Count {entity} by {group}",
        query_template="SELECT {group}, COUNT(*) FROM {entity} GROUP BY {group}",
        parameters=["entity", "group"],
        priority=2,
        parameter_patterns={
            "entity": r"\b(users|orders|products|employees)\b",
            "group": r"\b(department|status|category|region)\b",
        },
    ),
]

async def main():
    embedder = FastEmbedAdapter()

    async with Medha(
        collection_name="template_demo",
        embedder=embedder,
        templates=templates,
    ) as cache:
        # Template matching with parameter extraction
        hit = await cache.search("Show top 10 users")
        print(f"Strategy: {hit.strategy.value}")
        # template_match
        print(f"Query: {hit.generated_query}")
        # SELECT * FROM users ORDER BY id LIMIT 10

        hit = await cache.search("Show orders with status pending")
        print(f"Query: {hit.generated_query}")
        # SELECT * FROM orders WHERE status = 'pending'

asyncio.run(main())
```

### Load Templates from a JSON File

```json
[
    {
        "intent": "top_n_entities",
        "template_text": "Show top {count} {entity}",
        "query_template": "SELECT * FROM {entity} ORDER BY id LIMIT {count}",
        "parameters": ["count", "entity"],
        "priority": 1,
        "aliases": ["List first {count} {entity}"],
        "parameter_patterns": {
            "count": "\\b(\\d+)\\b",
            "entity": "\\b(users|orders|products)\\b"
        }
    }
]
```

```python
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

settings = Settings(template_file="templates.json")

cache = Medha(
    collection_name="my_cache",
    embedder=FastEmbedAdapter(),
    settings=settings,
)
# Templates are loaded automatically during cache.start()
```

### Load Templates at Runtime

```python
async with Medha(
    collection_name="my_cache",
    embedder=FastEmbedAdapter(),
) as cache:
    await cache.load_templates_from_file("templates.json")
    # or
    await cache.load_templates([QueryTemplate(...), QueryTemplate(...)])
```

---

## Parameter Extraction (NER)

Template matching requires extracting parameter values (e.g. `{department}`, `{person}`) from the user's question. `ParameterExtractor` applies a cascading strategy:

1. **Regex** — patterns defined in `template.parameter_patterns` (fastest, most precise)
2. **GLiNER** — zero-shot NER, uses `template.parameters` directly as entity labels
3. **spaCy** — pre-trained NER with a fixed label set mapped to parameter names
4. **Heuristics** — numbers and capitalized words as last resort

### spaCy (pre-trained, fixed labels)

spaCy recognizes standard entity types (`PERSON`, `ORG`, `CARDINAL`) and maps them to parameter names.

```python
from medha.utils.nlp import ParameterExtractor

ext = ParameterExtractor(use_spacy=True)
print(ext.spacy_available)  # True if en_core_web_sm is installed
```

### GLiNER (zero-shot, arbitrary labels)

GLiNER receives `template.parameters` directly as entity labels — no mapping table needed.
It excels with domain-specific entities that spaCy cannot recognize without custom training.

```python
from medha.utils.nlp import ParameterExtractor

# Default model: urchade/gliner_medium-v2.1
ext = ParameterExtractor(use_gliner=True)

# Lighter variant (~250 MB)
ext = ParameterExtractor(use_gliner=True, gliner_model="urchade/gliner_small-v2.1")

print(ext.gliner_available)  # True if gliner package is installed
```

### Both enabled (recommended for mixed template sets)

```python
from medha.utils.nlp import ParameterExtractor
from medha.types import QueryTemplate

ext = ParameterExtractor(use_spacy=True, use_gliner=True)

template = QueryTemplate(
    intent="org_project_issues",
    template_text="Show open issues for {org} on project {project}",
    query_template="SELECT * FROM issues WHERE org='{org}' AND project='{project}' AND status='open'",
    parameters=["org", "project"],
    # No regex needed — GLiNER resolves both from the param names directly
)

params = ext.extract("Show open issues for Acme Corp on project Apollo", template)
# {"org": "Acme Corp", "project": "Apollo"}

query = ext.render_query(template, params)
# SELECT * FROM issues WHERE org='Acme Corp' AND project='Apollo' AND status='open'
```

| Scenario | Recommended backend |
|---|---|
| Numeric or enum parameters | Regex only (`use_spacy=False, use_gliner=False`) |
| Standard entities (person, org, number) | spaCy (`use_spacy=True`) |
| Domain-specific or unpredictable param names | GLiNER (`use_gliner=True`) |
| Mixed templates in the same app | Both enabled — cascade handles it |
| Edge / resource-constrained deployment | Regex + heuristics only |

Both backends fall back gracefully if the package is not installed.

---

## Batch Operations

Efficiently store many question-query pairs at once.

```python
import asyncio
from medha import Medha
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

entries = [
    {
        "question": "How many users are there?",
        "generated_query": "SELECT COUNT(*) FROM users;",
    },
    {
        "question": "List all active orders",
        "generated_query": "SELECT * FROM orders WHERE status = 'active';",
    },
    {
        "question": "Average order value",
        "generated_query": "SELECT AVG(amount) FROM orders;",
        "response_summary": "Returns the mean order amount.",
    },
    {
        "question": "Top 5 customers by spend",
        "generated_query": "SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id ORDER BY total DESC LIMIT 5;",
    },
]

async def main():
    embedder = FastEmbedAdapter()

    async with Medha(
        collection_name="batch_demo",
        embedder=embedder,
    ) as cache:
        success = await cache.store_batch(entries)
        print(f"Batch stored: {success}")

        # Verify
        hit = await cache.search("How many users exist?")
        print(f"{hit.strategy.value}: {hit.generated_query}")
        # semantic_match: SELECT COUNT(*) FROM users;

asyncio.run(main())
```

---

## Synchronous Usage

Medha provides sync wrappers for environments where `asyncio` is not available (scripts, notebooks, legacy code).

```python
from medha import Medha
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

# Initialize
embedder = FastEmbedAdapter()
cache = Medha(collection_name="sync_demo", embedder=embedder)

# Must call start manually (no async context manager)
import asyncio
asyncio.run(cache.start())

# Sync search and store
cache.store_sync("List all products", "SELECT * FROM products;")
hit = cache.search_sync("Show me all products")
print(f"{hit.strategy.value}: {hit.generated_query}")

# Warm from file synchronously
loaded = cache.warm_from_file_sync("warm_queries.jsonl")

# Clear caches synchronously
cache.clear_caches_sync()

# Clean up
asyncio.run(cache.close())
```

---

## Query Language Examples

Medha is query-language agnostic. Here are examples for different query languages.

### SQL (Text-to-SQL)

```python
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

settings = Settings(query_language="sql")

async with Medha(
    collection_name="text2sql",
    embedder=FastEmbedAdapter(),
    settings=settings,
) as cache:
    await cache.store(
        "What are the top 10 products by revenue?",
        "SELECT p.name, SUM(o.amount) as revenue FROM products p JOIN orders o ON p.id = o.product_id GROUP BY p.name ORDER BY revenue DESC LIMIT 10;",
    )
```

### Cypher (Text-to-Cypher for Neo4j)

```python
settings = Settings(query_language="cypher")

async with Medha(
    collection_name="text2cypher",
    embedder=FastEmbedAdapter(),
    settings=settings,
) as cache:
    await cache.store(
        "Find friends of Alice",
        "MATCH (a:Person {name: 'Alice'})-[:FRIEND]->(f:Person) RETURN f.name",
    )
    await cache.store(
        "Shortest path between Alice and Bob",
        "MATCH p = shortestPath((a:Person {name: 'Alice'})-[*]-(b:Person {name: 'Bob'})) RETURN p",
    )
```

### GraphQL

```python
settings = Settings(query_language="graphql")

async with Medha(
    collection_name="text2graphql",
    embedder=FastEmbedAdapter(),
    settings=settings,
) as cache:
    await cache.store(
        "Get user profile with posts",
        '{ user(id: "123") { name email posts { title createdAt } } }',
    )
```

---

## Qdrant Performance Tuning

### HNSW Index Tuning

Adjust the HNSW index parameters for your workload.

```python
from medha import Settings

# High-throughput production (more memory, faster search)
settings = Settings(
    hnsw_m=32,                # More edges per node (default: 16)
    hnsw_ef_construct=200,    # Deeper construction search (default: 100)
)

# Low-memory / edge deployment
settings = Settings(
    hnsw_m=8,
    hnsw_ef_construct=50,
)
```

### Quantization

Reduce memory usage while maintaining search quality.

```python
from medha import Settings

# Scalar quantization (default, ~4x memory reduction)
settings = Settings(
    enable_quantization=True,
    quantization_type="scalar",
    quantization_rescore=True,        # Re-score with original vectors
    quantization_always_ram=True,     # Keep quantized vectors in RAM
)

# Binary quantization (best for high-dimensional embeddings >= 512d)
settings = Settings(
    enable_quantization=True,
    quantization_type="binary",
    quantization_oversampling=2.0,    # Fetch 2x candidates before re-scoring
)

# No quantization (maximum accuracy, more memory)
settings = Settings(enable_quantization=False)
```

### On-Disk Storage

Store original vectors on disk to save RAM. Useful for large caches.

```python
settings = Settings(
    qdrant_mode="docker",
    on_disk=True,                     # Vectors stored on disk
    enable_quantization=True,         # Quantized copies in RAM for speed
    quantization_always_ram=True,
)
```

### Batch Size Tuning

Control how many entries are upserted per Qdrant API call.

```python
# Large batch inserts (reduce API overhead)
settings = Settings(batch_size=500)

# Small batches (lower memory per call)
settings = Settings(batch_size=50)
```

---

## Cache Monitoring

Track cache performance and hit rates at runtime.

```python
import asyncio
from medha import Medha
from medha.embeddings.fastembed_adapter import FastEmbedAdapter

async def main():
    embedder = FastEmbedAdapter()

    async with Medha(
        collection_name="monitored_cache",
        embedder=embedder,
    ) as cache:
        # Populate cache
        await cache.store("Count all users", "SELECT COUNT(*) FROM users;")
        await cache.store("List departments", "SELECT DISTINCT department FROM employees;")

        # Run some searches
        await cache.search("How many users are there?")
        await cache.search("Show all departments")
        await cache.search("Something completely unrelated")

        # Check stats
        stats = cache.stats
        print(f"Total requests:  {stats['total_requests']}")
        print(f"Hit rate:        {stats['hit_rate']:.1f}%")
        print(f"L1 hits:         {stats['by_strategy']['l1_hits']}")
        print(f"Semantic hits:   {stats['by_strategy']['semantic_hits']}")
        print(f"Misses:          {stats['by_strategy']['misses']}")
        print(f"Total stored:    {stats['total_stored']}")
        print(f"Warm loaded:     {stats['warm_loaded']}")
        print(f"Templates:       {stats['templates_loaded']}")

        # Per-tier latency breakdown (ms)
        for tier, data in stats["tier_latencies_ms"].items():
            print(f"  {tier:12s}  avg={data['avg_ms']:.2f}ms  calls={data['calls']}")

        # Example output:
        # Total requests:  3
        # Hit rate:        66.7%
        # ...
        #   l1_cache      avg=0.01ms  calls=3
        #   template      avg=1.20ms  calls=3
        #   exact         avg=18.40ms calls=1
        #   semantic      avg=18.40ms calls=1
        #   fuzzy         avg=0.00ms  calls=0

asyncio.run(main())
```

---

## Logging

Configure Medha's logging for debugging and monitoring.

```python
from medha import setup_logging

# Basic: INFO level to console
setup_logging(level="INFO")

# Debug mode: see every tier of the waterfall search
setup_logging(level="DEBUG")

# Log to file + console with different levels
setup_logging(
    level="DEBUG",
    log_file="/var/log/medha/cache.log",
    console_level="WARNING",
)

# Custom format
setup_logging(
    level="INFO",
    fmt="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
    date_fmt="%Y-%m-%d %H:%M:%S",
)
```

---

## Full Production Example

A complete configuration combining all features for a production Text-to-SQL system.

```python
import asyncio
from medha import Medha, Settings, QueryTemplate, setup_logging
from medha.embeddings.openai_adapter import OpenAIAdapter

# Configure logging
setup_logging(level="INFO", log_file="medha.log")

# Production settings
settings = Settings(
    # Qdrant Cloud
    backend_type="qdrant",
    qdrant_mode="cloud",
    qdrant_url="https://your-cluster.cloud.qdrant.io",
    qdrant_api_key="your-api-key",  # stored as SecretStr, never logged

    # Query language
    query_language="sql",

    # Tuned thresholds
    score_threshold_exact=0.99,
    score_threshold_semantic=0.88,
    score_threshold_template=0.82,
    score_threshold_fuzzy=80.0,

    # L1 cache
    l1_cache_max_size=5000,

    # HNSW tuning
    hnsw_m=32,
    hnsw_ef_construct=200,

    # Quantization
    enable_quantization=True,
    quantization_type="scalar",
    quantization_rescore=True,
    quantization_always_ram=True,

    # Batch operations
    batch_size=200,

    # Templates from file
    template_file="production_templates.json",

    # Persist embedding cache across restarts
    embedding_cache_path="/var/cache/medha/embeddings.json",

    # Security
    max_question_length=8192,          # reject oversized questions (DoS guard)
    allowed_file_dir="/app/data",      # restrict warm_from_file() to this dir
    max_file_size_mb=100,              # reject files larger than 100 MB
)

# OpenAI embeddings
embedder = OpenAIAdapter(
    model_name="text-embedding-3-small",
    api_key="sk-your-key",
)

# Pre-defined templates
templates = [
    QueryTemplate(
        intent="employee_lookup",
        template_text="Find employees in {department}",
        query_template="SELECT * FROM employees WHERE department = '{department}'",
        parameters=["department"],
        priority=1,
        aliases=[
            "Show {department} employees",
            "Who works in {department}",
            "List {department} team",
        ],
        parameter_patterns={
            "department": r"\b(engineering|sales|marketing|hr|finance|ops)\b",
        },
    ),
]

async def main():
    from medha.l1_cache.redis_adapter import RedisL1Cache

    async with Medha(
        collection_name="production_text2sql",
        embedder=embedder,
        settings=settings,
        templates=templates,
        # Shared L1 cache across all service instances
        l1_backend=RedisL1Cache(url="redis://redis.internal:6379/0", ttl=3600),
    ) as cache:
        # Pre-warm cache from a curated file of known queries
        await cache.warm_from_file("common_queries.jsonl")

        # Or inline with store_batch for dynamic queries
        await cache.store_batch([
            {
                "question": "How many active users?",
                "generated_query": "SELECT COUNT(*) FROM users WHERE status = 'active';",
                "response_summary": "Count of active users",
            },
            {
                "question": "Total revenue this month",
                "generated_query": "SELECT SUM(amount) FROM orders WHERE date >= DATE_TRUNC('month', NOW());",
            },
            {
                "question": "Top customers by order count",
                "generated_query": "SELECT customer_id, COUNT(*) as n FROM orders GROUP BY customer_id ORDER BY n DESC LIMIT 10;",
            },
        ])

        # Search with full waterfall
        hit = await cache.search("Find employees in engineering")
        print(f"Strategy: {hit.strategy.value}")
        print(f"Query: {hit.generated_query}")
        print(f"Confidence: {hit.confidence:.3f}")

        # Monitor performance
        print(cache.stats)

asyncio.run(main())
```

---

## API Reference Summary

### Core

| Class / Method | Description |
|---|---|
| `Medha` | Core cache class with waterfall search |
| `Medha.search(question)` | Waterfall search → `CacheHit` |
| `Medha.store(question, query, ...)` | Store a question-query pair |
| `Medha.store_batch(entries)` | Bulk store (single embedding round-trip) |
| `Medha.warm_from_file(path)` | Pre-populate cache from JSON / JSONL file |
| `Medha.load_templates(templates)` | Load `QueryTemplate` list at runtime |
| `Medha.load_templates_from_file(path)` | Load templates from JSON file |
| `Medha.stats` | Dict with hit rates, per-tier latencies, stored/warm counts |
| `Medha.clear_caches()` | Clear L1 + embedding caches (async) |
| `Medha.search_sync` / `store_sync` / `warm_from_file_sync` / `clear_caches_sync` | Sync wrappers |

### Configuration & Types

| Class | Description |
|---|---|
| `Settings` | Pydantic configuration with env var support (`MEDHA_` prefix) |
| `CacheHit` | Search result: `generated_query`, `confidence`, `strategy` |
| `QueryTemplate` | Parameterized question-to-query template |
| `CacheEntry` | Stored cache entry with vector and metadata |
| `CacheResult` | Backend search result with score |
| `SearchStrategy` | Enum: `l1_cache`, `template_match`, `exact_match`, `semantic_match`, `fuzzy_match`, `no_match`, `error` |

### Interfaces & Backends

| Class | Description |
|---|---|
| `BaseEmbedder` | Abstract interface for embedding providers |
| `L1CacheBackend` | Abstract interface for L1 cache backends |
| `VectorStorageBackend` | Abstract interface for vector storage backends |
| `FastEmbedAdapter` | Local embeddings via FastEmbed (ONNX) |
| `OpenAIAdapter` | OpenAI embedding API adapter |
| `QdrantBackend` | Qdrant vector storage (memory / docker / cloud) |
| `InMemoryBackend` | Pure-Python in-process backend, zero deps (`backend_type="memory"`) |
| `PgVectorBackend` | PostgreSQL + pgvector backend (`pip install medha[pgvector]`) |
| `InMemoryL1Cache` | Default in-process LRU L1 cache |
| `RedisL1Cache` | Redis-backed L1 cache (`pip install medha[redis]`) |

### Utilities

| Function | Description |
|---|---|
| `setup_logging()` | Configure the `medha` logger |
| `ParameterExtractor` | NER-based parameter extractor (regex → GLiNER → spaCy → heuristics) |

---

## Roadmap

* [x] Redis L1 Cache backend (`RedisL1Cache`, `pip install medha[redis]`).
* [x] Cache warming from JSON / JSONL file (`warm_from_file`).
* [x] Per-tier latency stats (`tier_latencies_ms` in `cache.stats`).
* [x] Persistent embedding cache (`MEDHA_EMBEDDING_CACHE_PATH`).
* [x] Parallel execution of Tier 2 (exact) and Tier 3 (semantic).
* [x] `InMemoryBackend` — pure-Python vector backend, zero external deps.
* [x] `PgVectorBackend` — PostgreSQL + pgvector backend.
* [x] `backend_type` setting for declarative backend selection.
* [x] Security hardening: `max_question_length`, `max_file_size_mb`, `allowed_file_dir`, `qdrant_api_key` as `SecretStr`, PostgreSQL identifier validation.
* [ ] Feedback loop — mark a cache hit as correct/incorrect.

---

## Contributing

We welcome contributions! Please see `CONTRIBUTING.md` for details on how to set up the dev environment and run tests.

## License

This project is licensed under the Apache-2.0 License - see the [LICENSE](https://github.com/ArchAI-Labs/medha/blob/main/LICENSE) file for details.

---

*Built with ❤️ by **[ArchAI Labs](https://github.com/ArchAI-Labs)***
