Metadata-Version: 2.4
Name: zerag
Version: 0.1.0
Summary: Tiny RAG with implicit knowledge graph, powered by zvec
Project-URL: Homepage, https://github.com/Wayy-Research/zrag
Project-URL: Repository, https://github.com/Wayy-Research/zrag
Project-URL: Issues, https://github.com/Wayy-Research/zrag/issues
Author-email: Wayy Research <dev@wayy.io>
License-Expression: MIT
License-File: LICENSE
Keywords: embeddings,knowledge-graph,rag,vector-search,zvec
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: <3.13,>=3.10
Requires-Dist: click
Requires-Dist: sentence-transformers
Requires-Dist: zvec>=0.2.0
Provides-Extra: dev
Requires-Dist: pymupdf; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Provides-Extra: pdf
Requires-Dist: pymupdf; extra == 'pdf'
Description-Content-Type: text/markdown

# zrag

Tiny RAG with implicit knowledge graph, powered by [zvec](https://github.com/alibaba/zvec).

Ingest a directory of any file type into a local vector store. Query with natural language. Explore connections through an implicit knowledge graph — no servers, no API keys.

## Install

```bash
pip install zerag
```

For PDF support:

```bash
pip install "zerag[pdf]"
```

## Quick Start

### CLI

```bash
# Index a codebase
zrag ingest ./src

# Search it
zrag query "how does authentication work?"

# Explore the knowledge graph
zrag graph "authentication" --depth 1

# Check stats
zrag stats
```

### Python API

```python
from zrag import ZragStore

with ZragStore("./my_index") as store:
    # Ingest files
    store.ingest("./docs")

    # Query
    results = store.query("how does auth work?", topk=5)
    for r in results:
        print(f"{r['file_path']}:{r['chunk_index']} ({r['score']:.3f})")
        print(f"  {r['content'][:100]}")

    # Knowledge graph
    graph = store.graph("auth", depth=1)
    print(f"{len(graph['nodes'])} nodes, {len(graph['edges'])} edges")
```

## How It Works

**Vector search** — Files are chunked (~512 chars with paragraph-aware boundaries), embedded locally with all-MiniLM-L6-v2 (384-dim), and stored in a zvec HNSW index. Everything runs in-process.

**Implicit knowledge graph** — No stored graph. Edges are derived at query time:
- `same_file` — chunks from the same file (free from metadata)
- `adjacent` — consecutive chunks in a file (free from metadata)
- `similar` — chunks above a cosine similarity threshold (computed via zvec)

**Supported file types** — `.md`, `.txt`, `.py`, `.js`, `.ts`, `.tsx`, `.jsx`, `.json`, `.yaml`, `.yml`, `.toml`, `.csv`, `.html`, `.rst`, `.pdf`. Unknown extensions are attempted as UTF-8 text; binary files are skipped.

## CLI Reference

```
zrag [--index PATH] <command>

Commands:
  ingest <dir>   Ingest files from a directory
    --chunk-size   Chunk size in chars (default: 512)
    --overlap      Overlap between chunks (default: 64)

  query <text>   Search the index
    --topk         Number of results (default: 5)

  graph <text>   Build implicit knowledge graph
    --depth        Expansion depth (default: 1)
    --topk         Seed results (default: 5)
    --threshold    Similarity threshold for edges (default: 0.5)

  stats          Show index statistics
```

## Requirements

- Python 3.10-3.12 (zvec constraint)
- No API keys, no servers — embeddings run locally

## License

MIT — [Wayy Research](https://github.com/Wayy-Research)
