Metadata-Version: 2.4
Name: mistralai-search-toolkit-plugins-vespa
Version: 0.0.8
Summary: Vespa integration for mistralai-search-toolkit
Author-email: Mistral AI <support@mistral.ai>
License: Apache-2.0
License-File: LICENSE
Keywords: ai,information-retrieval,mistral,search,search-engine,vespa
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: docker>=7.1.0
Requires-Dist: httpx<1.0.0,>=0.27.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: kubernetes>=28.0.0
Requires-Dist: mistralai-search-toolkit
Requires-Dist: pydantic-settings>=2.11.0
Requires-Dist: pyvespa<=1.1.2
Requires-Dist: pyyaml<7,>=6.0.1
Requires-Dist: requests>=2.32.0
Requires-Dist: semver>=3.0.4
Requires-Dist: structlog<26,>=24
Requires-Dist: typer>=0.24.1
Provides-Extra: testkit
Requires-Dist: pytest>=7.0; extra == 'testkit'
Description-Content-Type: text/markdown

# Vespa Plugin for Search Toolkit

Vespa integration plugin for [`mistralai-search-toolkit`](https://pypi.org/project/mistralai-search-toolkit/).

This plugin provides a production-ready Vespa search backend implementation for the Search Toolkit, enabling powerful vector, keyword, and hybrid search capabilities.

## Installation

```bash
pip install mistralai-search-toolkit-plugins-vespa
```

Or as an optional dependency of the core package:

```bash
pip install mistralai-search-toolkit[vespa]
```

## Quick Start

### 1. Bootstrap Your Application

Create the application structure with an initial migration:

```bash
uv run mistral-vespa generate-migration --app-dir ./vespa_app initial_schema
```

This creates the `./vespa_app/` directory and generates a migration file. Fill it with your schema definition:

```python
from mistralai.search.toolkit.plugins.vespa.app.schemas.app import SearchMode
from mistralai.search.toolkit.plugins.vespa.migration import VespaMigration, create_default_schema, set_app_name

class InitialSchema(VespaMigration):
    def migrate(self) -> None:
        set_app_name("articles")
        create_default_schema(
            name="articles",
            mode=SearchMode.INDEX,
            embedding_dimensions=1024,  # Adjust based on your embedder
            schema_version=1,
        )
```

### 2. Start a Local Vespa Instance

```bash
uv run mistral-vespa local up --query-port 18080 --config-port 19171 --name vespa-dev
```

### 3. Deploy Your Application

Deploy the migrations to generate the `vespa_app` module:

```bash
uv run mistral-vespa migrate \
  --app-dir ./vespa_app \
  --config-server http://localhost:19171 \
  --query-port 18080
```

This generates the `vespa_app` Python module that you can now import.

### 4. Index Documents

```python
import os
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.loaders import FilesystemFileLoader
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app

# Setup
mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
vespa_config = VespaClientConfig(
    endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:18080"),
)
collection_name = "articles"

# Connect to Vespa
vector_store = app.get_search_index(vespa_config, collection_name=collection_name)

# Index documents
pipeline = Pipeline(
    loader=FilesystemFileLoader(),
    text_splitter=CharacterTextSplitter(chunk_size=512),
    embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
    stores=vector_store,
)

num_chunks = await pipeline.run(documents=["doc1.pdf", "doc2.pdf"])
```

### 4. Search

```python
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.search.toolkit.retrieval import QueryEngine
from mistralai.search.toolkit.retrieval.retrievers import VectorRetriever

# Setup search
embedder = MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING)
query_engine = QueryEngine(
    retriever=[VectorRetriever(client=vector_store, embedder=embedder)],
)

# Search documents
results = await query_engine.search(query="What is machine learning?", top_k=10)

# Display results
for result in results.results:
    print(f"Score: {result.score}")
    print(f"Content: {result.content}\n")
```

## Configuration

### Quick Setup

Use `app.get_search_index()` for the common case where a single endpoint serves both query and feed APIs:

```python
import os
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app

vespa_config = VespaClientConfig(
    endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:18080"),
)
vector_store = app.get_search_index(vespa_config, collection_name="articles")
```

### Advanced Setup

Use separate query and feed endpoints for production deployments:

```python
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app

client_config = VespaClientConfig(
    query_endpoint=os.environ.get("VESPA_QUERY_ENDPOINT", "https://query.vespa.example.com"),
    feed_endpoint=os.environ.get("VESPA_FEED_ENDPOINT", "https://feed.vespa.example.com"),
)
vector_store = app.get_search_index(
    client_config=client_config,
    collection_name="articles",
)
```

## License

This plugin is licensed under the Apache License 2.0.

## Support

For issues related to the Search Toolkit, refer to the [Search Toolkit documentation](https://pypi.org/project/mistralai-search-toolkit/).

For Vespa-specific questions, visit [Vespa documentation](https://docs.vespa.ai/).
