Metadata-Version: 2.4
Name: hydradb-sdk
Version: 0.0.1
Summary: The official Python SDK for the Hydra DB (hydradb.com)
Author-email: Nishkarsh Srivastava <nishkarsh@hydradb.com>
License: Copyright (c) 2026 Hydra DB
        
        All Rights Reserved.
        
        PROPRIETARY AND CONFIDENTIAL
        
        This software is the proprietary and confidential property of AGI Context, INC ("the Company").
        Permission is hereby granted to users to install and use this software as part of the Hydra DB service, subject to the terms and conditions of the service agreement entered into with the Company.
        
        You may not, without the express written permission of the Company:
        
        1. Copy, modify, or create derivative works of the software.
        2. Distribute, sell, rent, lease, sublicense, or otherwise transfer the software to any third party.
        3. Reverse engineer, decompile, or disassemble the software, except and only to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://hydradb.com/
Project-URL: Documentation, https://docs.hydradb.com/
Keywords: hydradb-sdk,hydradb,ai,sdk,api,generative ai,rag,db
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.24
Requires-Dist: pydantic<3,>=1.10
Dynamic: license-file

# Hydra DB Python SDK - [hydradb.com](https://www.hydradb.com/)

The official Python SDK for the Hydra DB platform. Build powerful, context-aware AI applications in your Python applications.

**Hydra DB** is your plug-and-play memory infrastructure. It powers intelligent, context-aware retrieval for any AI app or agent. Whether you're building a customer support bot, research copilot, or internal knowledge assistant.

[Learn more about the SDK from our docs](https://docs.hydradb.com)

## Core features

* **Dynamic retrieval and querying** that always retrieves the most relevant context
* **Built-in long-term memory** that evolves with every user interaction
* **Personalization hooks** for user preferences, intent, and history
* **Raw embeddings support** for bring-your-own vector workflows
* **Developer-first SDK** with full type safety and IDE autocompletion

## Getting started

### Installation

```bash
pip install hydra-db-python
```

### Client setup

Both synchronous and asynchronous clients are available. Use `AsyncHydraDB` for async/await patterns and `HydraDB` for traditional synchronous workflows. Both expose the exact same set of methods.

```python
import os
from hydra_db import HydraDB, AsyncHydraDB

api_key = os.environ["HYDRA_DB_API_KEY"]

# Sync client
client = HydraDB(token=api_key)

# Async client
async_client = AsyncHydraDB(token=api_key)
```

---

## Tenant Management

A `tenant` is a single isolated database. Within it you can create further isolated collections called `sub-tenants`. [Learn more](https://docs.hydradb.com/essentials/multi-tenant)

### Create a Tenant

```python
response = client.tenant.create(tenant_id="my-company")
```

You can also create a tenant optimised for raw vector embeddings:

```python
response = client.tenant.create(
    tenant_id="my-embeddings-tenant",
    is_embeddings_tenant=True,
    embeddings_dimension=1536,
)
```

### Get Sub-Tenant IDs

```python
sub_tenants = client.tenant.get_sub_tenant_ids(tenant_id="my-company")
# sub_tenants.sub_tenant_ids -> list of sub-tenant ID strings
```

### Get Infrastructure Status

Check whether the tenant's underlying infrastructure is ready:

```python
status = client.tenant.get_infra_status(tenant_id="my-company")
```

### Monitor Tenant Stats

```python
stats = client.tenant.monitor(tenant_id="my-company")
```

### Delete a Tenant

> **Warning:** This is irreversible and permanently removes all data.

```python
client.tenant.delete_tenant(tenant_id="my-company")
```

---

## Index Your Data

### Upload Knowledge (Files)

Upload documents to make them retrievable via natural language search.

```python
with open("report.pdf", "rb") as f:
    result = client.upload.knowledge(
        tenant_id="my-company",
        sub_tenant_id="my-sub-tenant",
        files=[("report.pdf", f, "application/pdf")],
        upsert=True,
    )
    # result.results[0].source_id -> ID you can use later
```

You can attach metadata to each file. Pass it as a **JSON string** — each object corresponds to the file at the same index:

```python
import json

file_metadata = json.dumps([
    {
        "id": "doc_a",
        "tenant_metadata": {"dept": "sales"},
        "document_metadata": {"author": "Alice"},
    },
    {
        "id": "doc_b",
        "tenant_metadata": {"dept": "marketing"},
        "document_metadata": {"author": "Bob"},
        "relations": {
            "cortex_source_ids": ["doc_a"],
            "properties": {"relation": "same_upload_batch"},
        },
    },
])

with open("a.pdf", "rb") as f1, open("b.pdf", "rb") as f2:
    result = client.upload.knowledge(
        tenant_id="my-company",
        sub_tenant_id="my-sub-tenant",
        files=[
            ("a.pdf", f1, "application/pdf"),
            ("b.pdf", f2, "application/pdf"),
        ],
        file_metadata=file_metadata,
        upsert=True,
    )
```

### Verify Processing Status

After uploading, check when files have finished indexing:

```python
status = client.upload.verify_processing(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    file_ids=["source-id-1", "source-id-2"],
)
# status.statuses[0].indexing_status -> "queued" | "processing" | "completed" | "errored"
```

### Add Memories

Index free-form text, markdown content, or conversation pairs as searchable memories.

**Plain text:**

```python
result = client.upload.add_memory(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    upsert=True,
    memories=[
        {
            "text": "User prefers detailed explanations and dark mode",
            "infer": True,
            "user_name": "John",
        }
    ],
)
```

**Markdown:**

```python
result = client.upload.add_memory(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    upsert=True,
    memories=[
        {
            "text": "# Meeting Notes\n\n## Key Points\n- Budget approved\n- Launch date: Q2",
            "is_markdown": True,
            "infer": False,
            "title": "Meeting Notes",
        }
    ],
)
```

**User–assistant conversation pairs:**

```python
result = client.upload.add_memory(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    upsert=True,
    memories=[
        {
            "user_assistant_pairs": [
                {"user": "What are my preferences?", "assistant": "You prefer dark mode and detailed explanations."},
                {"user": "How do I like my reports?", "assistant": "You prefer weekly summary reports with charts."},
            ],
            "infer": True,
            "user_name": "John",
            "custom_instructions": "Extract user preferences",
        }
    ],
)
```

### Delete a Memory

```python
client.upload.delete_memory(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    memory_id="memory-source-id",
)
```

---

## Search & Retrieval

### Full Recall

Hybrid semantic + keyword search across both knowledge and memories:

```python
results = client.recall.full_recall(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    query="Which mode does the user prefer?",
    alpha=0.8,        # 1.0 = pure semantic, 0.0 = pure keyword
    recency_bias=0,   # 0.0 = no bias, 1.0 = strongly prefer recent
    max_results=10,
)
# results.chunks -> list of VectorStoreChunk
# results.sources -> list of SourceInfo
```

### Recall Preferences

Search only user memory/preference data:

```python
preferences = client.recall.recall_preferences(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    query="dark mode preference",
    max_results=5,
)
```

### Boolean Recall

Exact keyword / phrase / boolean search (BM25):

```python
results = client.recall.boolean_recall(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    query="dark mode",
    operator="phrase",      # "or" | "and" | "phrase"
    max_results=10,
    search_mode="memories", # "sources" | "memories"
)
```

### Q&A (LLM-powered answer)

Ask a question and get a grounded answer generated by an LLM over your indexed content:

```python
answer = client.recall.qna(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    question="What is the user's preferred reporting format?",
    mode="fast",            # "fast" | "thinking"
    search_mode="memories",
    max_chunks=6,
)
```

You can optionally choose the LLM provider and model:

```python
answer = client.recall.qna(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    question="Summarise the budget decisions from the meeting notes.",
    mode="thinking",
    search_mode="sources",
    max_chunks=10,
    llm_provider="anthropic",
    model="claude-sonnet-4-6",
    temperature=0.2,
    max_tokens=1024,
)
```

---

## Fetch & Inspect Data

### List All Data

List sources (knowledge) or memories with optional filtering and pagination:

```python
# List knowledge sources
sources = client.fetch.list_data(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    kind="knowledge",
    page=1,
    page_size=50,
)

# List user memories
memories = client.fetch.list_data(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    kind="memories",
)
```

Filter by metadata:

```python
filtered = client.fetch.list_data(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    kind="knowledge",
    filters={"tenant_metadata": {"dept": "sales"}},
)
```

### Fetch Source Content

Retrieve the full content of a specific source by its ID:

```python
source = client.fetch.content(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    source_id="your-source-id",
    mode="content",  # "content" | "url" | "both"
)
```

### Fetch Graph Relations

Retrieve the graph relations (linked sources) for a given source:

```python
relations = client.fetch.graph_relations_by_source_id(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    source_id="your-source-id",
    is_memory=False,
    limit=10,
)
```

---

## Delete Data

Delete one or more sources (knowledge or memories) by their IDs:

```python
client.data.delete(
    tenant_id="my-company",
    sub_tenant_id="my-sub-tenant",
    ids=["source-id-1", "source-id-2"],
)
```

---

## API Key Management

> **Note:** This endpoint requires a dashboard session token (obtained via your Hydra DB dashboard login), not a standard API key.

```python
new_key = client.key.create_api_key(
    owner="service-account@myapp.com",
    scopes=["query"],
    env="live",
    prefix="sk",
)
# new_key.full_api_key -> the actual key (only shown once)
```

---

## Raw Embeddings

Use Hydra DB as a vector store with your own embeddings.

> **Note:** Raw embeddings require a tenant created with `is_embeddings_tenant=True` and a fixed `embeddings_dimension`. A standard knowledge tenant does not support raw embedding operations.

### Insert Embeddings

```python
client.embeddings.insert(
    tenant_id="my-embeddings-tenant",
    sub_tenant_id="my-sub-tenant",
    upsert=True,
    embeddings=[
        {
            "source_id": "my-doc-001",
            "metadata": {"category": "finance", "year": 2024},
            "embeddings": [
                {"chunk_id": "my-doc-001-chunk-0", "embedding": [0.1, 0.2, 0.3]},  # 1536 dims
                {"chunk_id": "my-doc-001-chunk-1", "embedding": [0.4, 0.5, 0.6]},
            ],
        }
    ],
)
```

### Search by Vector

```python
results = client.embeddings.search(
    tenant_id="my-embeddings-tenant",
    sub_tenant_id="my-sub-tenant",
    query_embedding=[0.1, 0.2, 0.3],  # 1536 dims
    limit=10,
)
```

### Filter Embeddings

```python
# By source
by_source = client.embeddings.filter(
    tenant_id="my-embeddings-tenant",
    sub_tenant_id="my-sub-tenant",
    source_id="my-doc-001",
    limit=50,
)

# By chunk IDs
by_chunks = client.embeddings.filter(
    tenant_id="my-embeddings-tenant",
    sub_tenant_id="my-sub-tenant",
    chunk_ids=["my-doc-001-chunk-0", "my-doc-001-chunk-1"],
)
```

### Delete Embeddings

```python
# Delete all embeddings for a source
client.embeddings.delete(
    tenant_id="my-embeddings-tenant",
    sub_tenant_id="my-sub-tenant",
    source_id="my-doc-001",
)

# Delete specific chunks
client.embeddings.delete(
    tenant_id="my-embeddings-tenant",
    sub_tenant_id="my-sub-tenant",
    chunk_ids=["my-doc-001-chunk-0"],
)
```

---

## Async Usage

Every method has an async equivalent on `AsyncHydraDB`. Method names and parameters are identical:

```python
import asyncio
from hydra_db import AsyncHydraDB

async_client = AsyncHydraDB(token="your-api-key")

async def main():
    result = await async_client.recall.full_recall(
        tenant_id="my-company",
        sub_tenant_id="my-sub-tenant",
        query="Which mode does the user prefer?",
        alpha=0.8,
        max_results=10,
    )
    print(result.chunks)

asyncio.run(main())
```

---

## SDK Method Reference

| Method | Description |
|---|---|
| `client.tenant.create` | Create a new tenant (standard or embeddings) |
| `client.tenant.get_sub_tenant_ids` | List all sub-tenant IDs within a tenant |
| `client.tenant.get_infra_status` | Check tenant infrastructure readiness |
| `client.tenant.monitor` | Get tenant usage and stats |
| `client.tenant.delete_tenant` | Permanently delete a tenant and all its data |
| `client.upload.knowledge` | Upload files to the knowledge base |
| `client.upload.verify_processing` | Poll indexing status of uploaded files |
| `client.upload.add_memory` | Index text, markdown, or conversation pairs as memories |
| `client.upload.delete_memory` | Delete a specific memory by ID |
| `client.recall.full_recall` | Hybrid semantic + keyword search |
| `client.recall.recall_preferences` | Search user memory / preference data only |
| `client.recall.boolean_recall` | Exact keyword / phrase / boolean search |
| `client.recall.qna` | LLM-powered question answering over indexed content |
| `client.fetch.list_data` | List all knowledge sources or memories |
| `client.fetch.content` | Fetch full content of a source by ID |
| `client.fetch.graph_relations_by_source_id` | Fetch graph relations for a source |
| `client.data.delete` | Delete sources or memories by ID |
| `client.key.create_api_key` | Create a scoped API key *(requires dashboard session token)* |
| `client.embeddings.insert` | Store raw vector embeddings *(requires embeddings tenant)* |
| `client.embeddings.search` | Vector similarity search |
| `client.embeddings.filter` | Retrieve embeddings by source or chunk IDs |
| `client.embeddings.delete` | Delete embeddings by source or chunk IDs |

> **Method Mapping:** `client.<group>.<method>` mirrors `api.hydradb.com/<group>/<method>`
>
> For example: `client.upload.knowledge()` → `POST /ingestion/upload_knowledge`

---

## Type Safety & IDE Support

The SDK provides exact type parity with the API:

- **Request parameters** — every field (required, optional, type, validation) is reflected in method signatures
- **Response objects** — return types are fully typed Pydantic models matching the API JSON schema
- **Nested objects** — complex parameters and responses preserve their full structure

Your IDE will automatically provide autocompletion, type-checking, inline documentation, and compile-time validation. Just hit **Cmd+Space / Ctrl+Space**.

---

## Links

- **Homepage:** [hydradb.com](https://www.hydradb.com/)
- **Documentation:** [docs.hydradb.com](https://docs.hydradb.com/)
- **API Reference:** [docs.hydradb.com/api-reference/introduction](https://docs.hydradb.com/api-reference/introduction)

## Support

If you have any questions or need help, reach out at [founders@hydradb.com](mailto:founders@usecortex.ai).
