Metadata-Version: 2.4
Name: mistralai-search-toolkit-storage-azure
Version: 0.0.6
Summary: Azure Blob ObjectStorage plugin for mistralai-search-toolkit
Author-email: Mistral AI <support@mistral.ai>
License: Apache-2.0
License-File: LICENSE
Keywords: ai,azure,blob-storage,mistral,plugin,search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.15,>=3.12
Requires-Dist: azure-identity<2.0.0,>=1.23.0
Requires-Dist: azure-storage-blob[aio]<12.29.0,>=12.28.0
Requires-Dist: mistralai-search-toolkit
Description-Content-Type: text/markdown

# Azure Blob Storage Plugin for Search Toolkit

Azure Blob Storage backend for [`mistralai-search-toolkit`](https://pypi.org/project/mistralai-search-toolkit/).

This plugin implements the Search Toolkit's `ObjectStorage` interface, enabling the ingestion pipeline to load files directly from Azure Blob Storage.

## Installation

```bash
pip install mistralai-search-toolkit-storage-azure
```

Or as an optional dependency of the core package:

```bash
pip install mistralai-search-toolkit[storage-azure]
```

## Quick Start: Load Files from Azure in Ingestion Pipeline

### 1. Upload a File to Azure Blob Storage

```python
import asyncio
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobObjectStorage

async def upload_file():
    storage = AzureBlobObjectStorage(
        container_name="documents",
        account_name="your-account",
    )

    # Upload a file
    with open("document.pdf", "rb") as f:
        data = f.read()

    await storage.put(key="documents/document.pdf", data=data)

asyncio.run(upload_file())
```

### 2. Load Files from Azure in Ingestion Pipeline

```python
import asyncio
import os
from mistralai.search.toolkit.ingestion.loaders import FileLoader
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobObjectStorage
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app

async def ingest_from_azure():
    # Create Azure storage factory
    def azure_storage_factory():
        return AzureBlobObjectStorage(
            container_name="documents",
            account_name="your-account",
        )

    # Create FileLoader backed by Azure
    file_loader = FileLoader(storage_factory=azure_storage_factory)

    # Create ingestion pipeline
    mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
    vespa_config = VespaClientConfig(
        endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
    )
    vector_store = app.get_search_index(vespa_config, collection_name="articles")

    pipeline = Pipeline(
        loader=file_loader,
        text_splitter=CharacterTextSplitter(chunk_size=512),
        embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
        stores=vector_store,
    )

    # Ingest documents from Azure
    num_chunks = await pipeline.run(documents=[
        "documents/document1.pdf",
        "documents/document2.pdf",
    ])

    print(f"Indexed {num_chunks} chunks")

asyncio.run(ingest_from_azure())
```

## Configuration

### Basic Setup

```python
storage = AzureBlobObjectStorage(
    container_name="documents",
    account_name="your-account",
)
```

### Using Connection String

```python
storage = AzureBlobObjectStorage(
    container_name="documents",
    connection_string="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...",
)
```

### Using Account Key

```python
storage = AzureBlobObjectStorage(
    container_name="documents",
    account_name="your-account",
    account_key="your-key",
)
```

### Using Managed Identity

```python
from azure.identity.aio import DefaultAzureCredential

storage = AzureBlobObjectStorage(
    container_name="documents",
    account_name="your-account",
    credential=DefaultAzureCredential(),
)
```

## Local Development

For local testing, use [Azurite](https://github.com/Azure/Azurite):

```bash
docker run -p 10000:10000 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0
```

Configure to use local emulator:

```python
storage = AzureBlobObjectStorage(
    container_name="documents",
    connection_string="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=<key>;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1/;",
)
```

## License

This plugin is licensed under the Apache License 2.0.

## Support

For Search Toolkit issues, refer to the [Search Toolkit documentation](https://pypi.org/project/mistralai-search-toolkit/).

For Azure Blob Storage documentation, visit [Azure Blob Storage Docs](https://docs.microsoft.com/en-us/azure/storage/blobs/).
