Metadata-Version: 2.4
Name: attune-rag
Version: 0.1.22
Summary: Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.
Author-email: Patrick Roebuck <admin@smartaimemory.com>
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
        
        Copyright 2026 Patrick Roebuck / Smart AI Memory
        
Project-URL: Homepage, https://github.com/Smart-AI-Memory/attune-rag
Project-URL: Repository, https://github.com/Smart-AI-Memory/attune-rag
Project-URL: Documentation, https://github.com/Smart-AI-Memory/attune-rag/blob/main/README.md
Project-URL: Changelog, https://github.com/Smart-AI-Memory/attune-rag/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/Smart-AI-Memory/attune-rag/issues
Keywords: rag,retrieval-augmented-generation,llm,claude,gemini,grounding,provenance,attune
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: structlog>=24.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: jsonschema>=4.20
Provides-Extra: attune-help
Requires-Dist: attune-help<0.11,>=0.10.1; extra == "attune-help"
Provides-Extra: claude
Requires-Dist: anthropic<1.0,>=0.95; extra == "claude"
Provides-Extra: gemini
Requires-Dist: google-genai<2.0,>=1.0; extra == "gemini"
Provides-Extra: author
Requires-Dist: attune-author>=0.13.0; extra == "author"
Provides-Extra: all
Requires-Dist: attune-help<0.11,>=0.10.1; extra == "all"
Requires-Dist: anthropic<1.0,>=0.95; extra == "all"
Requires-Dist: google-genai<2.0,>=1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: attune-help<0.11,>=0.10.1; extra == "dev"
Requires-Dist: attune-author>=0.13.0; extra == "dev"
Requires-Dist: anthropic<1.0,>=0.95; extra == "dev"
Requires-Dist: google-genai<2.0,>=1.0; extra == "dev"
Dynamic: license-file

# attune-rag

Lightweight, LLM-agnostic RAG pipeline with pluggable
corpora. Works with Claude, Gemini, or any LLM.

- **No LLM SDK at install time.** All provider deps are
  optional extras. Two required runtime deps: `structlog`,
  `jinja2`.
- **Pluggable corpus.** Use attune-help (the default), any
  markdown directory, or your own `CorpusProtocol`.
- **Returns a prompt string + citation records** by default
  — `pipeline.run()` never opens a network connection. You
  call your own LLM however you like. Optional provider
  adapters ship convenience wrappers.
- **Optional hybrid retrieval.** `QueryExpander` and
  `LLMReranker` layer Claude Haiku on top of keyword
  retrieval to improve recall and precision — both opt-in,
  both fail-safe.

## Why attune-rag

Most RAG libraries ship features. attune-rag ships **measured
quality numbers** and gates merges against them. The CI badge
isn't "tests pass" — it's `P@1 ≥ 0.95, R@3 = 1.00, mean
faithfulness ≥ 0.9686` (locked at
[`docs/specs/release-quality-baseline/baseline-1.md`](docs/specs/release-quality-baseline/baseline-1.md))
plus per-axis CPU + wall-clock perf thresholds (locked at
[`docs/specs/downstream-validation/perf-baseline.md`](docs/specs/downstream-validation/perf-baseline.md)).

A PR that drops `mean_faithfulness` below `0.9686` fails CI
automatically. Same for any latency hot-path regressing past
`mean + 2σ`. That's the differentiator.

### vs LangChain / LlamaIndex

| | attune-rag | LangChain | LlamaIndex |
|---|---|---|---|
| Required runtime deps | 2 | many (transitively, ~30+) | many (~25+) |
| LLM SDK at install | none | bundled | bundled |
| Published quality regression thresholds | yes (P@1, R@3, faithfulness) | no | no |
| Published perf thresholds (wall + CPU) | yes | no | no |
| Citation primitives built-in | yes | add-on | add-on |
| "Get a string back, call your own LLM" | default | possible w/ effort | possible w/ effort |

LangChain and LlamaIndex are fantastic frameworks if you want
batteries-included orchestration. attune-rag is the alternative
when you want a RAG component you can drop into an existing app
without buying into a framework — and want the quality bar
quantified, not implied.

### What attune-rag is **not**

Honest exclusions, so you can self-disqualify if you need any
of these:

- **Not an agent framework.** No multi-step chains, no tool-use
  orchestration, no agent loops.
- **Not a document-parsing toolkit.** Bring your markdown
  already-parsed; use `unstructured.io` or similar upstream.
- **Not a vector DB integration.** Keyword retrieval is the
  default; you wire your own vector store if you need one (an
  `EmbeddingRetriever` is on the post-freeze roadmap — see
  below).
- **Not a one-line-install batteries-included framework.** That's
  LangChain / LlamaIndex. attune-rag is for the case where that's
  too much.

## Install

```bash
pip install attune-rag                     # core only
pip install 'attune-rag[attune-help]'      # + bundled help corpus
pip install 'attune-rag[claude]'           # + Claude adapter
pip install 'attune-rag[gemini]'           # + Gemini adapter
pip install 'attune-rag[all]'              # everything
```

## Quick start — Claude

```bash
pip install 'attune-rag[attune-help,claude]'
```

```python
import asyncio
from attune_rag import RagPipeline

async def main():
    pipeline = RagPipeline()  # defaults to AttuneHelpCorpus
    response, result = await pipeline.run_and_generate(
        "How do I run a security audit with attune?",
        provider="claude",
    )
    print(response)
    print("\nSources:", [h.entry.path for h in result.citation.hits])

asyncio.run(main())
```

## Quick start — Gemini

```bash
pip install 'attune-rag[attune-help,gemini]'
```

```python
response, result = await pipeline.run_and_generate(
    "...", provider="gemini", model="gemini-1.5-pro",
)
```

## Quick start — custom corpus, any LLM

```python
from pathlib import Path
from attune_rag import RagPipeline, DirectoryCorpus

pipeline = RagPipeline(corpus=DirectoryCorpus(Path("./my-docs")))
result = pipeline.run("How do I...?")

# Send result.augmented_prompt to whatever LLM you use.
# The pipeline itself does NOT call an LLM unless you use
# run_and_generate or call a provider adapter yourself.
```

## Hybrid retrieval (optional)

`QueryExpander` and `LLMReranker` require the `[claude]` extra and an
`ANTHROPIC_API_KEY`. Both are opt-in and fail-safe — any API error
falls back to keyword-only order automatically.

```python
from attune_rag import RagPipeline, LLMReranker, QueryExpander

# Reranker only (recommended for precision):
pipeline = RagPipeline(reranker=LLMReranker())

# Expander + reranker (max coverage):
pipeline = RagPipeline(
    expander=QueryExpander(),
    reranker=LLMReranker(),
)
```

## Template editor primitives (`attune_rag.editor`)

Headless toolkit for tools that need to validate, lint, and refactor a
template corpus — used by the [`attune-gui`](https://pypi.org/project/attune-gui/)
template editor and the [`attune-author`](https://pypi.org/project/attune-author/)
`edit` CLI, but works standalone with any
[`CorpusProtocol`](https://github.com/Smart-AI-Memory/attune-rag).

| API | What it does |
|-----|---------------|
| `load_schema()` | Loads `template_schema.json` (the v1 frontmatter contract: required `type` enum + `name`; optional `tags`, `aliases`, `summary`, `source`, `hash`; `additionalProperties: true`). |
| `parse_frontmatter(text)` / `validate_frontmatter(data)` | Split a template into frontmatter + body and report typed `FrontmatterIssue`s — used by linters and editors. |
| `lint_template(text, rel_path, corpus)` | Returns `Diagnostic[]` for schema violations, broken `[[alias]]` references, and depth-marker sequence errors. 1-indexed line/col ranges. |
| `autocomplete_tags(corpus, prefix, limit)` / `autocomplete_aliases(corpus, prefix, limit)` | Prefix-match completions ranked by frequency (tags) or lexical proximity (aliases). Sub-ms on 1k templates. |
| `find_references(corpus, name, kind)` | Locate every alias/tag/path occurrence across body, frontmatter, and `cross_links.json`. |
| `plan_rename(corpus, old, new, kind)` | Build a `RenamePlan` (one `FileEdit` per affected file with unified-diff hunks) for `kind="alias"` or `"tag"`. Raises `RenameCollisionError` on existing alias targets. |
| `apply_rename(corpus, plan)` | Atomically apply the plan (tempfile-per-file + sequential rename + drift-detection rollback). Returns the list of affected paths. |

Schema, lint, and rename are pure functions over `CorpusProtocol` — no I/O,
no global state. All three pieces are tested as a unit and used live by the
attune-gui editor's `/api/corpus/<id>/lint`, `/autocomplete`, and
`/refactor/rename/{preview,apply}` routes.

```python
from attune_rag import DirectoryCorpus
from attune_rag.editor import lint_template, plan_rename, apply_rename

corpus = DirectoryCorpus(Path("./templates")).load()

# Validate a template before saving
diagnostics = lint_template(
    text=Path("./templates/concepts/foo.md").read_text(),
    rel_path="concepts/foo.md",
    corpus=corpus,
)

# Rename an alias across the whole corpus
plan = plan_rename(corpus, old="oldname", new="newname", kind="alias")
print(f"Affects {len(plan.edits)} files")
affected = apply_rename(corpus, plan)
```

## Dashboard

```bash
attune-rag dashboard show    # live terminal dashboard
attune-rag dashboard render --out report.html  # HTML snapshot
```

## Quality baselines

attune-rag locks two baselines, both gated by CI. Thresholds
are empirically derived (`mean ± 2σ`) from back-to-back
benchmark runs on an unchanged HEAD — grounded, not guessed.

### Retrieval + faithfulness

| Metric | Threshold (current) | Source |
|---|---:|---|
| `precision_at_1` | **≥ 0.95** | retrieval, deterministic |
| `recall_at_3` | **= 1.00** | retrieval, deterministic |
| `mean_faithfulness` | **≥ 0.9686** | Claude judge, σ ≈ 0.005 |

Gated by [`.github/workflows/benchmark.yml`](.github/workflows/benchmark.yml).
Faithfulness gating engages when the PR touches retrieval,
reranker, expander, pipeline, prompts, or eval paths, or when
the PR title contains `[full-bench]`. Methodology + raw numbers
in [`docs/specs/release-quality-baseline/`](docs/specs/release-quality-baseline/baseline-1.md).

### Per-hot-path latency

Locked dual-axis (wall-clock + CPU-time) thresholds on the four
benchmarks. CPU-time is the gating axis (deterministic);
wall-clock is advisory through Phase 4 burn-in, then revisited.

| Benchmark | Axis | Mean | Threshold |
|---|---|---:|---:|
| `keyword_retriever_retrieve` | cpu | 3,212 µs | 34,493 µs |
| `directory_corpus_load` | cpu | 47 µs | 66 µs |
| `rag_pipeline_run` (retrieval-only) | cpu | 537 µs | 625 µs |
| `llm_reranker_rerank` | wall | 728 ms | 1.07 s |

Numbers measured from N=30 back-to-back runs on the
locked-baseline runner (Linux `ubuntu-latest`, CPython 3.11.15).
Two thresholds reflect different noise profiles:

- **`keyword_retriever_retrieve`** has a wide CPU band because its
  measured σ ≈ 15.6 ms reflects cold-cache effects on the first
  few iterations — the threshold formula is `mean + 2σ`,
  empirically derived rather than tuned for tightness.
- **`llm_reranker_rerank`** is wall-clock-only because Anthropic
  network variance (σ ≈ 170 ms) dominates the CPU axis; the
  gate is set generously and is advisory through Phase 4 W3.

Gated by [`.github/workflows/perf.yml`](.github/workflows/perf.yml)
per-PR (advisory comment in Phase 4 W1–W2, blocking in W3.1).
Raw numbers + hardware fingerprint + the full 8-row dual-axis
table:
[`docs/specs/downstream-validation/perf-baseline.md`](docs/specs/downstream-validation/perf-baseline.md).

### Why this is the differentiator

Most RAG libraries A/B-test internally and ship the result.
attune-rag publishes the thresholds, gates merges against them,
and re-measures whenever the corpus, judge prompt, or hardware
changes. The receipts are checked in.

## Bundled `.help/` corpus

The repo ships a polished `.help/` corpus that documents
attune-rag's own surface — 143 templates across 13 features ×
11 kinds (`concept`, `task`, `reference`, `quickstart`, `faq`,
`error`, `warning`, `tip`, `note`, `comparison`,
`troubleshooting`). Generated by
[`attune-author`](https://pypi.org/project/attune-author/) with
strict fact-check; queryable via `AttuneHelpCorpus` or as the
bundled default for `RagPipeline()`. See
[`.help/features.yaml`](.help/features.yaml) for the feature
map and [`.help/templates/`](.help/templates/) for the content.

The 13 features: `pipeline`, `retrieval`, `corpus`, `prompts`,
`provenance`, `providers`, `eval`, `benchmark`, `cli`, `editor`,
`dashboard`, `expander`, `reranker`.

### What faithfulness measures

Faithfulness scores how well an answer is **grounded in the retrieved
passages** — `1.0` means every claim in the answer is supported by a
cited source; lower scores mean some claims have no support in the
context. It catches hallucination in a way that `precision_at_k` and
`recall_at_k` can't: those only measure whether the *right documents*
were retrieved, not whether the *generated answer* actually used them.

attune-rag uses **Claude as the judge** via Anthropic's tool-use API
to produce a structured score in `[0.0, 1.0]` for each
`(query, answer, retrieved_context)` triple. The reported metric is
the mean over the golden query set. Aggregate σ ≈ `0.005` over 40
queries even though per-query judge non-determinism can swing 40+
percentage points on individual queries — averaging absorbs the noise.

### Run faithfulness manually

```bash
pip install 'attune-rag[claude]'
export ANTHROPIC_API_KEY=sk-ant-...

# Retrieval metrics only (free, deterministic):
attune-rag-benchmark --queries queries.yaml --json out.json

# Add faithfulness (~1 Claude API call per query, costs tokens):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --json out.json

# Compare extended-thinking on vs off (2× judge cost):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --compare-thinking --json out.json
```

The judge implementation lives at
`attune_rag.eval.faithfulness.FaithfulnessJudge`. Note: `attune_rag.eval.*`
is currently INTERNAL and may move — the `attune-rag-benchmark
--with-faithfulness` CLI is the stable contract.

For the methodology behind the `0.9686` threshold, the v1/v2 ground-truth
calibration runs, and the extended-thinking-vs-default decision record, see
[`docs/rag/faithfulness-thinking-calibration.md`](https://github.com/Smart-AI-Memory/attune-rag/blob/main/docs/rag/faithfulness-thinking-calibration.md).

## Roadmap — embeddings (post-freeze 0.2.0+)

Keyword retrieval + optional Claude reranker currently meet
the locked `P@1 ≥ 0.95, R@3 = 1.00` thresholds against the
attune-help golden set. The remaining hard queries
(3 of 28, currently `xpass`-gated under `[no-embeddings]`)
have zero token overlap against their target doc (e.g.
"vulnerability scan" → `tool-security-audit.md`). Closing
that gap needs vector search.

The plan is to ship `attune-rag[embeddings]` using
[`fastembed`](https://github.com/qdrant/fastembed) for local,
CPU-only embeddings — no new network dependency, no API key
required at retrieval time. Keyword retrieval stays the default;
embeddings layer in opt-in, same shape as `QueryExpander` and
`LLMReranker`. Shipping is paced by the
[Phase 4 feature-freeze](docs/specs/downstream-validation/)
currently in progress — public surface additions wait for the
0.2.0 cut.

See
[CHANGELOG.md](https://github.com/Smart-AI-Memory/attune-rag/blob/main/CHANGELOG.md)
for the decision record and remaining-gap analysis.

## Prompt caching (Claude only)

When using the Claude provider, `run_and_generate` automatically enables
[Anthropic prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
on the stable RAG context prefix (≥ 1 024 chars). This eliminates
repeated token costs on the corpus portion of the prompt when the same
context block is reused across calls.

No configuration needed — the provider handles the `cache_control`
header automatically.

## Public API

attune-rag's public surface is documented below and snapshot-tested
in [tests/unit/test_api_surface.py](tests/unit/test_api_surface.py).
Formal SemVer commitments begin with the 0.2.0 release — see
[docs/POLICY.md](docs/POLICY.md) for the deprecation policy. Until
then the surface is honor-system: the lock test catches drift, but
treat 0.1.x as still-evolving.

**Top-level (`from attune_rag import ...`):**

- Pipeline — `RagPipeline`, `RagResult`
- Corpus — `CorpusProtocol`, `RetrievalEntry`, `DirectoryCorpus`,
  `AttuneHelpCorpus`
- Retrieval — `KeywordRetriever`, `RetrievalHit`, `RetrieverProtocol`
- Provenance — `CitationRecord`, `CitedSource`, `ClaimCitation`,
  `format_citations_markdown`, `format_claim_citations_markdown`
- Prompting — `build_augmented_prompt`, `PROMPT_VARIANTS`
- Hybrid retrieval — `QueryExpander`, `LLMReranker`

**PUBLIC submodules** (importable by qualified path):

- `attune_rag.corpus` — exposes `AliasInfo`, `DuplicateAliasError`
  in addition to the top-level corpus names
- `attune_rag.corpus.attune_help` — `AttuneHelpCorpus`
- `attune_rag.corpus.help_adapter` — `HelpCorpusAdapter` Protocol
- `attune_rag.providers` — `LLMProvider`, `get_provider`,
  `list_available`
- `attune_rag.editor` — template-editor primitives (lint, schema,
  rename, autocomplete, references); see "Template editor primitives"
  above for the symbol list
- `attune_rag.editor.{rename,schema,lint,autocomplete,references}` —
  the individual editor submodules

Anything not listed above is INTERNAL and may change in any release.
The underscore-prefixed editor modules (`attune_rag.editor._rename`
etc.) shipped in 0.1.x are deprecation shims as of 0.2.0; they
re-export the new non-underscore names and emit `DeprecationWarning`.
They are removed in 0.3.0.

## Status

v0.1.19 (alpha). In
[Phase 4 of the v1.0 roadmap](docs/specs/ROADMAP-v1.md) — a
four-week feature freeze + downstream validation soak; formal
`0.2.0` SemVer cut follows. Until then the public surface is
honor-system + lock-tested; deprecation policy at
[`docs/POLICY.md`](docs/POLICY.md).

Part of the attune ecosystem
([attune-ai](https://github.com/Smart-AI-Memory/attune-ai),
[attune-help](https://github.com/Smart-AI-Memory/attune-help),
[attune-author](https://github.com/Smart-AI-Memory/attune-author),
[attune-gui](https://github.com/Smart-AI-Memory/attune-gui)).

## License

Apache 2.0. See
[LICENSE](https://github.com/Smart-AI-Memory/attune-rag/blob/main/LICENSE).
