Metadata-Version: 2.4
Name: haystack-pixeltable
Version: 0.1.0
Summary: Haystack Document Store and Retriever backed by Pixeltable multimodal data infrastructure.
Author-email: Pixeltable <contact@pixeltable.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/pixeltable/haystack-pixeltable
Project-URL: Repository, https://github.com/pixeltable/haystack-pixeltable
Project-URL: Documentation, https://docs.pixeltable.com/
Project-URL: Issues, https://github.com/pixeltable/haystack-pixeltable/issues
Project-URL: Discord, https://discord.gg/QPyqFYx2UN
Keywords: haystack,pixeltable,document-store,retriever,multimodal,embeddings,rag
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: haystack-ai>=2.6.0
Requires-Dist: pixeltable>=0.2.28
Requires-Dist: numpy
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Dynamic: license-file

# pixeltable-haystack

[![PyPI](https://img.shields.io/pypi/v/pixeltable-haystack)](https://pypi.org/project/pixeltable-haystack/)
[![CI](https://github.com/pixeltable/haystack-pixeltable/actions/workflows/ci.yml/badge.svg)](https://github.com/pixeltable/haystack-pixeltable/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

Haystack Document Store and Retriever backed by [Pixeltable](https://pixeltable.com/) — persistent, versioned, multimodal data infrastructure for AI applications.

## Installation

```bash
pip install pixeltable-haystack
```

## Quick Start

### Document Store

```python
from haystack import Document
from haystack_pixeltable import PixeltableDocumentStore

store = PixeltableDocumentStore(
    table_name="myproject.docs",
    embedding_dimension=1536,
)

# Write documents
store.write_documents([
    Document(content="Pixeltable is multimodal data infrastructure.", embedding=[...]),
    Document(content="Haystack is a framework for building RAG pipelines.", embedding=[...]),
])

# Filter documents
results = store.filter_documents(
    filters={"field": "meta.category", "operator": "==", "value": "docs"}
)

# Count
print(store.count_documents())
```

### Retriever (Similarity Search)

```python
from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever

store = PixeltableDocumentStore(
    table_name="myproject.docs",
    embedding_dimension=1536,
)
retriever = PixeltableRetriever(document_store=store, top_k=5)

# Search by embedding vector
result = retriever.run(query_embedding=[0.1, 0.2, ...])
for doc in result["documents"]:
    print(f"{doc.content} (score: {doc.score:.3f})")
```

### In a Haystack Pipeline

```python
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever

store = PixeltableDocumentStore(
    table_name="rag.knowledge",
    embedding_dimension=384,
)

# Indexing pipeline
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store=store))
indexing.connect("embedder", "writer")

# Query pipeline
query = Pipeline()
query.add_component("embedder", SentenceTransformersTextEmbedder())
query.add_component("retriever", PixeltableRetriever(document_store=store, top_k=5))
query.connect("embedder.embedding", "retriever.query_embedding")
```

## Filtering

The Document Store supports the [Haystack filter specification](https://docs.haystack.deepset.ai/docs/metadata-filtering):

```python
# Simple equality
store.filter_documents(filters={"field": "meta.category", "operator": "==", "value": "science"})

# Comparison operators: ==, !=, >, >=, <, <=
store.filter_documents(filters={"field": "meta.score", "operator": ">", "value": 0.8})

# Compound AND
store.filter_documents(filters={
    "operator": "AND",
    "conditions": [
        {"field": "meta.category", "operator": "==", "value": "science"},
        {"field": "meta.score", "operator": ">", "value": 0.5},
    ],
})

# Compound OR
store.filter_documents(filters={
    "operator": "OR",
    "conditions": [
        {"field": "meta.source", "operator": "==", "value": "arxiv"},
        {"field": "meta.source", "operator": "==", "value": "pubmed"},
    ],
})
```

## Pixeltable Escape Hatch: `.table`

The `.table` property gives direct access to the underlying Pixeltable table for operations beyond the Haystack interface:

```python
store = PixeltableDocumentStore(table_name="myproject.docs", embedding_dimension=1536)
t = store.table

# Add a computed column
import pixeltable.functions.openai as openai
t.add_computed_column(
    summary=openai.chat_completions(
        messages=[{"role": "user", "content": t.content}],
        model="gpt-4o-mini",
    )
)

# Use arbitrary Pixeltable queries
results = t.where(t.meta["category"] == "science").select(t.content, t.summary).collect()

# Version history
print(t.count(version=-1))  # row count at previous version
```

## Why Pixeltable?

| Feature | Pixeltable | Chroma | Qdrant | pgvector |
|---------|-----------|--------|--------|----------|
| Persistent storage | Built-in | Opt-in | Opt-in | Built-in |
| Computed columns | Native | No | No | No |
| Version history | Native | No | No | No |
| Multimodal types | Image, Video, Audio, Document | Text only | Text only | Text only |
| Metadata filtering | JSON + SQL predicates | Limited | Rich | SQL |
| Embedding auto-compute | Via computed columns | Manual | Manual | Manual |

## Development

```bash
pip install -e ".[dev]"
pytest tests/ -v
ruff check . && ruff format --check .
```

## License

Apache 2.0
