Metadata-Version: 2.4
Name: private-me-haystack
Version: 0.2.0
Summary: Haystack integration for xLink identity-based authentication (RAG pipelines, 603× faster M2M auth)
Author: Private.Me Contributors
License: SEE LICENSE IN LICENSE.md
Project-URL: Homepage, https://private.me
Project-URL: Documentation, https://private.me/docs/haystack
Project-URL: Repository, https://github.com/private-me/platform
Project-URL: Issues, https://github.com/private-me/platform/issues
Keywords: haystack,rag,retrieval-augmented-generation,xlink,identity,authentication,ai-agents,m2m,zero-config,enterprise
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Security :: Cryptography
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: private-me-aci-core>=0.3.1
Requires-Dist: private-me-shared>=0.1.1
Requires-Dist: private-me-ux-helpers>=0.1.1
Requires-Dist: haystack-ai>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"

# Private.Me xLink for Haystack (Python)

**Identity-based authentication for Haystack RAG pipelines.**

Python bindings for `@private.me/haystack`. Build Haystack RAG pipelines with cryptographic identity instead of API keys. Zero-config setup, enterprise-grade access control, no cascading failures.

## Installation

### Prerequisites

This package requires both the Node.js module and Python bindings:

```bash
# 1. Install Node.js package
npm install @private.me/haystack haystack

# 2. Install Python bindings
pip install private-me-haystack
```

**Requirements:**
- Python 3.9+
- Node.js 18+ (for backend)
- npm (for Node.js package installation)
- Haystack 2.x

## Quick Start

```python
from private_me import haystack_xlink

# Create RAG pipeline (generates identity automatically)
rag = await haystack_xlink.create_rag(verbose=True)

# Add documents with access control
doc = haystack_xlink.create_document(
    content='Sensitive information',
    owner_did=rag.get_did(),
    access_level='confidential'  # public | internal | confidential
)
await rag.add_document(doc)

# Retrieve with identity-based access
result = await rag.retrieve('query', agent_did)
# Returns only documents the requestor has access to
```

**That's it.** No API keys, no OAuth, no configuration files.

## API Reference

### `create_rag(config: Dict[str, Any] = None) -> HaystackXLinkRAG`

Create RAG pipeline with identity-based authentication.

**Parameters:**
- `config` (Dict[str, Any], optional): Configuration options
  - `name` (str): Pipeline name
  - `verbose` (bool): Enable logging (default: False)
  - `enforce_access_control` (bool): Enable access control checks (default: True)

**Returns:**
- `HaystackXLinkRAG`: RAG pipeline instance

**Example:**
```python
rag = await haystack_xlink.create_rag({
    'name': 'my-rag',
    'verbose': True,
    'enforce_access_control': True
})
```

### `create_document(content: str, owner_did: str, access_level: str = 'public', metadata: Dict[str, Any] = None) -> Dict[str, Any]`

Create document with access control metadata.

**Parameters:**
- `content` (str): Document content
- `owner_did` (str): Owner's decentralized identifier
- `access_level` (str): Access level - `public`, `internal`, or `confidential` (default: `public`)
- `metadata` (Dict[str, Any], optional): Additional metadata
  - `title` (str): Document title
  - `category` (str): Document category
  - `authorized_dids` (List[str]): List of DIDs with access to confidential documents

**Returns:**
- `Dict[str, Any]`: Document object

**Example:**
```python
doc = haystack_xlink.create_document(
    content='Customer data: 50K users, $2.5M MRR',
    owner_did=rag.get_did(),
    access_level='confidential',
    metadata={
        'title': 'Q4 Revenue Report',
        'category': 'financial',
        'authorized_dids': ['did:key:z6Mkcollaborator...']
    }
)
```

### `HaystackXLinkRAG.get_did() -> str`

Get pipeline's decentralized identifier.

**Returns:**
- `str`: DID (e.g., `did:key:z6Mk...`)

**Example:**
```python
did = rag.get_did()
print(f"RAG Identity: {did}")
```

### `HaystackXLinkRAG.add_document(doc: Dict[str, Any]) -> str`

Add document to index.

**Parameters:**
- `doc` (Dict[str, Any]): Document created via `create_document()`

**Returns:**
- `str`: Document ID

**Raises:**
- `RuntimeError`: If Node.js backend fails
- `ValueError`: If document is invalid

**Example:**
```python
doc_id = await rag.add_document(doc)
print(f"Document added: {doc_id}")
```

### `HaystackXLinkRAG.retrieve(query: str, requestor_did: str, limit: int = 10) -> List[Dict[str, Any]]`

Retrieve documents matching query with access control.

**Parameters:**
- `query` (str): Search query
- `requestor_did` (str): Requestor's DID for access control
- `limit` (int): Maximum number of results (default: 10)

**Returns:**
- `List[Dict[str, Any]]`: List of matching documents (filtered by access control)

**Raises:**
- `RuntimeError`: If Node.js backend fails

**Example:**
```python
results = await rag.retrieve('budget', engineering_did, limit=5)
for doc in results:
    print(f"- {doc['meta']['title']}")
```

### `HaystackXLinkRAG.list_documents() -> List[Dict[str, Any]]`

List all documents in index (admin/monitoring only, no access control).

**Returns:**
- `List[Dict[str, Any]]`: All documents

**Example:**
```python
all_docs = rag.list_documents()
print(f"Total documents: {len(all_docs)}")
```

### `HaystackXLinkRAG.clear() -> None`

Clear all documents from index.

**Example:**
```python
await rag.clear()
```

### `HaystackXLinkRAG.export_identity() -> bytes`

Export identity for persistence.

**Returns:**
- `bytes`: PKCS8 private key

**Example:**
```python
identity = await rag.export_identity()
with open('rag-identity.key', 'wb') as f:
    f.write(identity)
```

### `from_identity(pkcs8: bytes, config: Dict[str, Any] = None) -> HaystackXLinkRAG`

Create RAG from existing identity.

**Parameters:**
- `pkcs8` (bytes): PKCS8 private key
- `config` (Dict[str, Any], optional): Configuration options

**Returns:**
- `HaystackXLinkRAG`: RAG pipeline with restored identity

**Example:**
```python
with open('rag-identity.key', 'rb') as f:
    identity = f.read()

restored = await haystack_xlink.from_identity(identity)
print(restored.get_did())  # Same DID as original
```

## Usage Examples

### Basic RAG Pipeline

```python
from private_me import haystack_xlink

# Create pipeline
rag = await haystack_xlink.create_rag({
    'name': 'my-rag',
    'enforce_access_control': True
})

print(f"RAG Identity: {rag.get_did()}")

# Add document
doc = haystack_xlink.create_document(
    content='Customer data: 50K users, $2.5M MRR',
    owner_did=rag.get_did(),
    access_level='confidential'
)

await rag.add_document(doc)

# Retrieve documents
results = await rag.retrieve('users', rag.get_did())
print(f"Found {len(results)} documents")
for doc in results:
    print(f"- {doc['meta']['title']}")
```

### Multi-Agent Document Sharing

```python
from private_me import haystack_xlink

# Create two agents
finance = await haystack_xlink.create_rag({'name': 'finance'})
engineering = await haystack_xlink.create_rag({'name': 'engineering'})

# Finance adds confidential document with authorized access
doc = haystack_xlink.create_document(
    content='Budget allocation for Q4',
    owner_did=finance.get_did(),
    access_level='confidential',
    metadata={
        'authorized_dids': [engineering.get_did()]  # Grant access
    }
)
await finance.add_document(doc)

# Engineering can access the shared document
results = await finance.retrieve('budget', engineering.get_did())
print(f"Found {len(results)} shared documents")
```

### Identity Persistence

```python
from private_me import haystack_xlink
import os

# Export identity
rag = await haystack_xlink.create_rag()
identity = await rag.export_identity()

with open('rag-identity.key', 'wb') as f:
    f.write(identity)

# Restore RAG with same identity
with open('rag-identity.key', 'rb') as f:
    identity_bytes = f.read()

restored = await haystack_xlink.from_identity(identity_bytes)
print(restored.get_did() == rag.get_did())  # True
```

### Access Control Enforcement

```python
from private_me import haystack_xlink

rag = await haystack_xlink.create_rag({'enforce_access_control': True})
owner_did = rag.get_did()
other_user_did = 'did:key:z6Mkother_user'

# Add confidential document
doc = haystack_xlink.create_document('Secret data', owner_did, 'confidential')
await rag.add_document(doc)

# Owner can retrieve
owner_results = await rag.retrieve('secret', owner_did)
print(len(owner_results))  # 1 (found)

# Other user cannot retrieve
other_results = await rag.retrieve('secret', other_user_did)
print(len(other_results))  # 0 (access denied)
```

## Three-Tier Access Control

| Level | Accessible To | Use Case |
|-------|---------------|----------|
| **public** | Everyone | Company handbook, privacy policy |
| **internal** | Team members | API docs, technical specs |
| **confidential** | Owner + authorized | Financial reports, customer data |

## Architecture

This package uses a **wrapper pattern**:

```
Python App → Python Bindings → Node.js Backend → Haystack RAG
```

1. **Python layer**: Provides Pythonic API (`create_rag()`, `retrieve()`)
2. **Node.js backend**: Handles cryptographic operations (Ed25519, DID generation)
3. **Haystack integration**: Document indexing, retrieval, access control enforcement

## Troubleshooting

### "Node.js module not found"

**Error:**
```
RuntimeError: Node.js module @private.me/haystack not found
```

**Solution:**
```bash
# Install Node.js package first
npm install @private.me/haystack haystack

# Verify installation
ls node_modules/@private.me/haystack
```

### "Node.js backend error"

**Error:**
```
RuntimeError: Node.js backend error: <message>
```

**Solution:**
- Check Node.js version (requires 18+): `node --version`
- Verify Node.js package installed: `npm list @private.me/haystack`
- Check Haystack installed: `npm list haystack`
- Review Node.js error message for details

### Access denied errors

**Error:**
```
Empty results despite matching documents
```

**Solution:**
- Verify `requestor_did` has access (check `access_level` and `authorized_dids`)
- For confidential documents: add requestor DID to `authorized_dids` list
- For internal documents: ensure requestor is a team member
- Check `enforce_access_control` is True

### Invalid document errors

**Error:**
```
ValueError: Invalid document format
```

**Solution:**
- Use `create_document()` helper function
- Verify `owner_did` is a valid DID (starts with `did:key:`)
- Check `access_level` is one of: `public`, `internal`, `confidential`

## Development

### Running Tests

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest -v

# Run with coverage
pytest -v --cov=private_me --cov-report=html
```

### Building

```bash
# Build wheel
python setup.py bdist_wheel

# Validate build
bash validate-build.sh
```

## Support

- **Documentation**: https://private.me/docs/haystack
- **White Paper**: https://private.me/docs/haystack.html
- **Email**: contact@private.me
- **GitHub**: https://github.com/xail-io/xail

## License

Proprietary - See LICENSE.md

---

**Questions?** Visit [private.me/docs/haystack](https://private.me/docs/haystack) for complete documentation.
