Metadata-Version: 2.4
Name: langchain-seahorse
Version: 0.4.1
Summary: LangChain VectorStore integration for Seahorse API Gateway
Author-email: Seahorse Team <support@dnotitia.com>
License: MIT
License-File: LICENSE
Keywords: embeddings,langchain,seahorse,vector-database,vectorstore
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: <4.0,>=3.10
Requires-Dist: httpx>=0.27.0
Requires-Dist: langchain-core>=0.3.81
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: black>=26.3.1; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.12.0; extra == 'dev'
Requires-Dist: pytest>=9.0.3; extra == 'dev'
Requires-Dist: python-dotenv>=1.2.2; extra == 'dev'
Requires-Dist: ruff>=0.3.0; extra == 'dev'
Provides-Extra: ollama
Requires-Dist: langchain-core>=1.2.5; extra == 'ollama'
Requires-Dist: langchain-ollama>=0.1.0; extra == 'ollama'
Requires-Dist: langchain>=0.3.0; extra == 'ollama'
Provides-Extra: test
Requires-Dist: python-dotenv>=1.2.2; extra == 'test'
Description-Content-Type: text/markdown

# LangChain Seahorse VectorStore

[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

LangChain VectorStore integration for Seahorse API Gateway - A high-performance vector database for semantic search and RAG applications.

## Features

- **LangChain Compatible**: Full implementation of LangChain VectorStore interface
- **Schema-Aware Column Resolution**: Dense and sparse vector columns are auto-resolved from `GET /v2/data/schema`
- **Hybrid Search**: Dense, Sparse, and Hybrid (RRF) search modes
- **Dual Embedding Support**: Use Seahorse's built-in embeddings or bring your own (OpenAI, Cohere, etc.)
- **Metadata Filtering**: Filter search results by metadata
- **Batch Processing**: Efficient handling of large datasets (auto-batched; max 50 rows/request, max 32KB text/row)
- **Indexing & Health Monitoring**: `get_indexed_row_count()` returns a typed `IndexedRowCount` model for tracking index build progress; `health()` provides a drop-in liveness probe
- **Type-Safe**: Complete type hints for Python 3.8+
- **Well-Tested**: Comprehensive unit and integration tests

## Installation

```bash
# Using pip
pip install langchain-seahorse

# Using uv (recommended)
uv add langchain-seahorse
```

## Quick Start

### Basic Usage with Built-in Embeddings

```python
from seahorse_vector_store import SeahorseVectorStore

# Initialize vectorstore
vectorstore = SeahorseVectorStore(
    api_key="your-seahorse-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)

# Add documents
ids = vectorstore.add_texts(
    texts=[
        "Machine learning is a subset of AI.",
        "Deep learning uses neural networks.",
    ],
    metadatas=[
        {"source": "doc1.pdf", "page": 1},
        {"source": "doc2.pdf", "page": 5},
    ]
)

# Search
docs = vectorstore.similarity_search(
    query="What is machine learning?",
    k=2
)

for doc in docs:
    print(doc.page_content)
    print(doc.metadata)
```

### Using External Embeddings

```python
from seahorse_vector_store import SeahorseVectorStore
from langchain_openai import OpenAIEmbeddings

vectorstore = SeahorseVectorStore(
    api_key="your-seahorse-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
    embedding=OpenAIEmbeddings(api_key="your-openai-key"),
    use_builtin_embedding=False,
)

# Use as normal...
```

### Hybrid Search (Dense + Sparse)

```python
from seahorse_vector_store import SeahorseVectorStore, SearchMode

vectorstore = SeahorseVectorStore(
    api_key="your-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
)

# Default: Hybrid search (Dense + Sparse with RRF fusion)
docs = vectorstore.similarity_search("machine learning", k=5)

# Pure Dense search
docs = vectorstore.similarity_search(
    "machine learning", k=5, retrieval_mode=SearchMode.DENSE
)

# Pure Sparse search (BM25-based)
docs = vectorstore.similarity_search(
    "machine learning", k=5, retrieval_mode=SearchMode.SPARSE
)
```

### Metadata Filtering

```python
# Search with metadata filter
docs = vectorstore.similarity_search(
    query="neural networks",
    k=5,
    filter={"source": "doc1.pdf", "page": 1}
)
```

### Indexing Status & Health Check

```python
# Per-index indexing progress (typed model). Top-level counts are
# writer-based; ``stats.readable`` adds a reader-node view (segment dedup
# + ``row_count - deleted_row_count`` saturating).
stats = vectorstore.get_indexed_row_count()
print(stats.total_row_count)
for idx in stats.indexed_counts:
    print(f"{idx.index_name} ({idx.index_type}): {idx.indexed_row_count}")

# Skip the reader-node ``readable`` view when only writer counts are needed
stats = vectorstore.get_indexed_row_count(readable=False)

# Lightweight liveness probe — True on 200 OK, False on any SeahorseAPIError
if not vectorstore.health():
    raise RuntimeError("Seahorse backend is unreachable")
```

## 🔧 Configuration

### Environment Variables

You can set API credentials via environment variables:

```bash
export SEAHORSE_API_KEY="your-api-key"
export SEAHORSE_BASE_URL="https://your-table-uuid.api.seahorse.dnotitia.ai"
```

Then use them in your code:

```python
import os
from seahorse_vector_store import SeahorseVectorStore

vectorstore = SeahorseVectorStore(
    api_key=os.environ["SEAHORSE_API_KEY"],
    base_url=os.environ["SEAHORSE_BASE_URL"],
)
```

### Advanced Options

```python
vectorstore = SeahorseVectorStore(
    api_key="your-api-key",
    base_url="https://your-table-uuid.api.seahorse.dnotitia.ai",
    use_builtin_embedding=True,  # Use Seahorse embeddings
    # dense_column / sparse_column are optional explicit overrides.
    # If omitted, the SDK resolves them from GET /v2/data/schema.
)
```

### Primary Key Behavior

Seahorse uses mandatory content-hash primary keys.

- `add_texts()` and `from_texts()` always return IDs generated from Seahorse PK rules.
- Caller-provided custom IDs, including LangChain `Document.id`, are not persisted as the stored row ID.
- Use the returned `ids` from insert operations as the source of truth for later delete workflows.

## 📖 API Reference

### SeahorseVectorStore

Main class for interacting with Seahorse as a vector store.

#### Synchronous Methods

- `add_texts(texts, metadatas=None, **kwargs)` - Add texts to the vector store
- `similarity_search(query, k=4, filter=None, **kwargs)` - Search for similar documents
- `similarity_search_with_score(query, k=4, filter=None, **kwargs)` - Search with distance scores
- `similarity_search_by_vector(embedding, k=4, filter=None, **kwargs)` - Search by vector
- `similarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs)` - Search by vector with scores
- `delete(ids=None, **kwargs)` - Delete documents by IDs
- `from_texts(texts, embedding=None, metadatas=None, **kwargs)` - Create vectorstore from texts
- `get_indexed_row_count(readable=True)` - Per-index indexed row counts as `IndexedRowCount`
- `health()` - Lightweight liveness probe (returns `bool`)

#### Async Methods

- `aadd_texts(texts, metadatas=None, **kwargs)` - Add texts asynchronously
- `asimilarity_search(query, k=4, filter=None, **kwargs)` - Search asynchronously
- `asimilarity_search_with_score(query, k=4, filter=None, **kwargs)` - Search with scores asynchronously
- `asimilarity_search_by_vector(embedding, k=4, filter=None, **kwargs)` - Search by vector asynchronously
- `asimilarity_search_by_vector_with_score(embedding, k=4, filter=None, **kwargs)` - Search by vector with scores asynchronously
- `adelete(ids=None, **kwargs)` - Delete documents asynchronously
- `aget_indexed_row_count(readable=True)` - Per-index indexed row counts (async)
- `ahealth()` - Async liveness probe

#### Search Modes

- `SearchMode.HYBRID` (default) - Dense + Sparse with RRF fusion
- `SearchMode.DENSE` - Pure dense vector search
- `SearchMode.SPARSE` - Pure sparse (BM25) search

#### Not Supported

- `max_marginal_relevance_search()` - ⚠️ MMR search is not supported by Seahorse API


## Testing

### Setup for Integration Tests

Create a `.env` file in the project root with your Seahorse credentials:

```bash
# Copy the example file
cp .env.example .env

# Edit .env and add your credentials
SEAHORSE_API_KEY=your-api-key
SEAHORSE_BASE_URL=https://your-table-uuid.api.seahorse.dnotitia.ai
```

### Running Tests

```bash
# Run unit tests
uv run pytest tests/unit/

# Run basic integration tests (requires .env file with API credentials)
uv run pytest tests/integration/ \
  --ignore=tests/integration/test_ollama_embeddings.py \
  --ignore=tests/integration/test_rag_pipeline.py

# Run all tests with coverage
uv run pytest --cov=seahorse_vector_store --cov-report=term-missing

# Skip integration tests
uv run pytest -m "not integration"
```

### Running Ollama Integration Tests (Optional)

For advanced tests using Ollama LLM and embeddings:

```bash
# 1. Install Ollama dependencies (Python 3.9+ required)
uv pip install langchain langchain-ollama

# 2. Start Ollama server
ollama serve

# 3. Download models
ollama pull qwen3-embedding:8b  # For embeddings
ollama pull qwen3:8b             # For RAG

# 4. Run Ollama tests
uv run pytest tests/integration/test_ollama_embeddings.py -v
uv run pytest tests/integration/test_rag_pipeline.py -v

# 5. Run all integration tests (including Ollama)
uv run pytest tests/integration/ -v
```

**Note**: Ollama tests will automatically skip if Ollama is not available or required models are not installed.

## Examples

See the `examples/` directory for complete examples:

- `basic_usage.py` - Basic vectorstore operations
- `async_usage.py` - Async/await operations for better performance
- `rag_pipeline.py` - Building a RAG (Retrieval-Augmented Generation) pipeline
- `metadata_filtering.py` - Advanced metadata filtering techniques
- `external_embeddings.py` - Using external embeddings (OpenAI, Cohere, etc.)

## Documentation

- [API Reference](docs/API_REFERENCE.md) - Complete API documentation
- [Tutorial](./TUTORIAL.md) 

## Requirements

- Python 3.8+
- langchain-core >= 0.2.0
- httpx >= 0.27.0
- pydantic >= 2.0.0

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Support

- **Console**: [Seahorse Console](https://console.seahorse.dnotitia.ai)

## Links

- [LangChain Documentation](https://python.langchain.com/)
- [Seahorse Cloud](https://console.seahorse.dnotitia.ai)

## Development Status

This package is in **Beta** stage. APIs are stabilizing.

Current version: **0.4.0**

---

Made by the Seahorse Team
