Metadata-Version: 2.4
Name: dewey-haystack
Version: 0.1.0
Summary: Haystack integration for Dewey — document store, retriever, and research component
Author-email: Dewey <hi@meetdewey.com>
License-Expression: MIT
Project-URL: Homepage, https://meetdewey.com
Project-URL: Repository, https://github.com/meetdewey/dewey-haystack
Keywords: haystack,dewey,rag,retrieval,document-store,llm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: meetdewey>=1.0
Requires-Dist: haystack-ai>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-mock>=3; extra == "dev"

# dewey-haystack

[![CI](https://github.com/meetdewey/dewey-haystack/actions/workflows/ci.yml/badge.svg)](https://github.com/meetdewey/dewey-haystack/actions/workflows/ci.yml)

[Haystack](https://haystack.deepset.ai/) integration for [Dewey](https://meetdewey.com) — document store, retriever, and research component.

## Installation

```bash
pip install dewey-haystack
```

## Components

### DeweyDocumentStore

Haystack DocumentStore backed by a Dewey collection. Handles document upload and deletion; Dewey manages chunking and embeddings automatically.

```python
from haystack_integrations.document_stores.dewey import DeweyDocumentStore
from haystack.utils import Secret

store = DeweyDocumentStore(
    api_key=Secret.from_env_var("DEWEY_API_KEY"),
    collection_id="3f7a1b2c-...",
)
```

Upload Haystack Documents:

```python
from haystack import Document

store.write_documents([
    Document(content="Neural networks learn via backpropagation.", meta={"source": "ml.txt"}),
    Document(content="Transformers use self-attention mechanisms."),
])
```

### DeweyRetriever

Drop-in Haystack retriever backed by Dewey's hybrid semantic + BM25 search.

```python
from haystack import Pipeline
from haystack_integrations.document_stores.dewey import DeweyDocumentStore
from haystack_integrations.components.retrievers.dewey import DeweyRetriever
from haystack.utils import Secret

store = DeweyDocumentStore(
    api_key=Secret.from_env_var("DEWEY_API_KEY"),
    collection_id="3f7a1b2c-...",
)

pipeline = Pipeline()
pipeline.add_component("retriever", DeweyRetriever(document_store=store, top_k=8))

result = pipeline.run({"retriever": {"query": "What are the key findings?"}})
for doc in result["retriever"]["documents"]:
    print(f"[{doc.meta['filename']}] {doc.content}")
```

Each returned `Document` carries citation metadata:

| Field | Description |
|---|---|
| `score` | Relevance score (0–1) |
| `document_id` | Dewey document ID |
| `filename` | Original filename |
| `section_id` | Section ID |
| `section_title` | Section heading |
| `section_level` | Heading depth (1 = top-level) |

**RAG pipeline with an LLM:**

```python
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

prompt_template = """
Answer the question using only the provided context.

Context:
{% for doc in documents %}
- {{ doc.content }}
{% endfor %}

Question: {{ query }}
"""

pipeline = Pipeline()
pipeline.add_component("retriever", DeweyRetriever(document_store=store, top_k=5))
pipeline.add_component("prompt", PromptBuilder(template=prompt_template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))

pipeline.connect("retriever.documents", "prompt.documents")
pipeline.connect("prompt.prompt", "llm.prompt")

result = pipeline.run({
    "retriever": {"query": "What were the main findings?"},
    "prompt": {"query": "What were the main findings?"},
})
print(result["llm"]["replies"][0])
```

### DeweyResearchComponent

A Haystack component that runs Dewey's full agentic research loop — searching, reading, and synthesising across multiple documents — and returns a grounded Markdown answer with cited sources.

Use this as a drop-in replacement for an LLM generator when you want Dewey to handle both retrieval *and* generation.

```python
from haystack import Pipeline
from haystack_integrations.components.retrievers.dewey import DeweyResearchComponent
from haystack.utils import Secret

pipeline = Pipeline()
pipeline.add_component(
    "research",
    DeweyResearchComponent(
        api_key=Secret.from_env_var("DEWEY_API_KEY"),
        collection_id="3f7a1b2c-...",
        depth="balanced",
    ),
)

result = pipeline.run({"research": {"query": "What were the key findings across all studies?"}})
print(result["research"]["answer"])

for source in result["research"]["sources"]:
    print(f"  [{source.meta['filename']}] {source.content[:80]}...")
```

**Outputs:**

| Key | Type | Description |
|---|---|---|
| `answer` | `str` | Synthesised Markdown answer |
| `sources` | `list[Document]` | Source chunks cited by the answer |

**Research depths:**

| depth | Speed | Tools | Requires BYOK |
|---|---|---|---|
| `quick` | fast | basic search | no |
| `balanced` | fast | basic search | no |
| `deep` | slower | full tool suite | yes |
| `exhaustive` | slowest | full tool suite | yes |

`deep` and `exhaustive` require a Dewey Pro plan and a BYOK API key configured on your project.

**With a custom model:**

```python
DeweyResearchComponent(
    api_key=Secret.from_env_var("DEWEY_API_KEY"),
    collection_id="3f7a1b2c-...",
    depth="deep",
    model="claude-sonnet-4-6",  # requires Anthropic BYOK key on your project
)
```

## Requirements

- Python 3.9+
- `meetdewey >= 1.0`
- `haystack-ai >= 2.0`

## Development

```bash
pip install -e ".[dev]"
pytest
```
