Metadata-Version: 2.4
Name: attune-rag
Version: 0.2.0
Summary: Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.
Author-email: Patrick Roebuck <admin@smartaimemory.com>
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
        
        Copyright 2026 Patrick Roebuck / Smart AI Memory
        
Project-URL: Homepage, https://github.com/Smart-AI-Memory/attune-rag
Project-URL: Repository, https://github.com/Smart-AI-Memory/attune-rag
Project-URL: Documentation, https://github.com/Smart-AI-Memory/attune-rag/blob/main/README.md
Project-URL: Changelog, https://github.com/Smart-AI-Memory/attune-rag/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/Smart-AI-Memory/attune-rag/issues
Keywords: rag,retrieval-augmented-generation,llm,claude,gemini,grounding,provenance,attune
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: structlog>=24.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: jsonschema>=4.20
Provides-Extra: attune-help
Requires-Dist: attune-help<0.11,>=0.10.1; extra == "attune-help"
Provides-Extra: claude
Requires-Dist: anthropic<1.0,>=0.95; extra == "claude"
Provides-Extra: gemini
Requires-Dist: google-genai<2.0,>=1.0; extra == "gemini"
Provides-Extra: author
Requires-Dist: attune-author>=0.13.0; extra == "author"
Provides-Extra: all
Requires-Dist: attune-help<0.11,>=0.10.1; extra == "all"
Requires-Dist: anthropic<1.0,>=0.95; extra == "all"
Requires-Dist: google-genai<2.0,>=1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: attune-help<0.11,>=0.10.1; extra == "dev"
Requires-Dist: attune-author>=0.13.0; extra == "dev"
Requires-Dist: anthropic<1.0,>=0.95; extra == "dev"
Requires-Dist: google-genai<2.0,>=1.0; extra == "dev"
Dynamic: license-file

# attune-rag

Lightweight, LLM-agnostic RAG pipeline with pluggable
corpora. Works with Claude, Gemini, or any LLM.

- **No LLM SDK at install time.** All provider deps are
  optional extras. Two required runtime deps: `structlog`,
  `jinja2`.
- **Pluggable corpus.** Use attune-help (the default), any
  markdown directory, or your own `CorpusProtocol`.
- **Returns a prompt string + citation records** by default
  — `pipeline.run()` never opens a network connection. You
  call your own LLM however you like. Optional provider
  adapters ship convenience wrappers.
- **Optional hybrid retrieval.** `QueryExpander` and
  `LLMReranker` layer Claude Haiku on top of keyword
  retrieval to improve recall and precision — both opt-in,
  both fail-safe.

## Why attune-rag

Most RAG libraries ship features. attune-rag ships **measured
quality numbers** and gates merges against them. The CI badge
isn't "tests pass" — it's `P@1 ≥ 0.95, R@3 = 1.00, mean
faithfulness ≥ 0.9686` (locked at
[`docs/specs/release-quality-baseline/baseline-1.md`](docs/specs/release-quality-baseline/baseline-1.md))
plus per-axis CPU + wall-clock perf thresholds (locked at
[`docs/specs/downstream-validation/perf-baseline.md`](docs/specs/downstream-validation/perf-baseline.md)).

A PR that drops `mean_faithfulness` below `0.9686` fails CI
automatically. Same for any latency hot-path regressing past
`mean + 2σ`. That's the differentiator.

### vs LangChain / LlamaIndex

| | attune-rag | LangChain | LlamaIndex |
|---|---|---|---|
| Required runtime deps | 2 | many (transitively, ~30+) | many (~25+) |
| LLM SDK at install | none | bundled | bundled |
| Published quality regression thresholds | yes (P@1, R@3, faithfulness) | no | no |
| Published perf thresholds (wall + CPU) | yes | no | no |
| Citation primitives built-in | yes | add-on | add-on |
| "Get a string back, call your own LLM" | default | possible w/ effort | possible w/ effort |

LangChain and LlamaIndex are fantastic frameworks if you want
batteries-included orchestration. attune-rag is the alternative
when you want a RAG component you can drop into an existing app
without buying into a framework — and want the quality bar
quantified, not implied.

Beyond drop-in retrieval, attune-rag is the grounding foundation
for the `attune-*` family's content-quality discipline. The
`attune-author` polish/fact-check pipeline uses attune-rag's
retrieval + faithfulness primitives to verify generated help
content is grounded in source material before it's marked
authoritative — the same `mean_faithfulness ≥ 0.9686` discipline
that gates this library's own benchmarks, extended to the
authoring loop.

### What attune-rag is **not**

Honest exclusions, so you can self-disqualify if you need any
of these:

- **Not an agent framework.** No multi-step chains, no tool-use
  orchestration, no agent loops.
- **Not a document-parsing toolkit.** Bring your markdown
  already-parsed; use `unstructured.io` or similar upstream.
- **Not a vector DB integration.** Keyword retrieval is the
  default; you wire your own vector store if you need one (an
  `EmbeddingRetriever` is on the post-freeze roadmap — see
  below).
- **Not a one-line-install batteries-included framework.** That's
  LangChain / LlamaIndex. attune-rag is for the case where that's
  too much.

## Install

```bash
pip install attune-rag                     # core only
pip install 'attune-rag[attune-help]'      # + bundled help corpus
pip install 'attune-rag[claude]'           # + Claude adapter
pip install 'attune-rag[gemini]'           # + Gemini adapter
pip install 'attune-rag[all]'              # everything
```

## Quick start — Claude

```bash
pip install 'attune-rag[attune-help,claude]'
```

```python
import asyncio
from attune_rag import RagPipeline

async def main():
    pipeline = RagPipeline()  # defaults to AttuneHelpCorpus
    response, result = await pipeline.run_and_generate(
        "How do I run a security audit with attune?",
        provider="claude",
    )
    print(response)
    print("\nSources:", [h.entry.path for h in result.citation.hits])

asyncio.run(main())
```

## Quick start — Gemini

```bash
pip install 'attune-rag[attune-help,gemini]'
```

```python
response, result = await pipeline.run_and_generate(
    "...", provider="gemini", model="gemini-1.5-pro",
)
```

## Quick start — custom corpus, any LLM

```python
from pathlib import Path
from attune_rag import RagPipeline, DirectoryCorpus

pipeline = RagPipeline(corpus=DirectoryCorpus(Path("./my-docs")))
result = pipeline.run("How do I...?")

# Send result.augmented_prompt to whatever LLM you use.
# The pipeline itself does NOT call an LLM unless you use
# run_and_generate or call a provider adapter yourself.
```

> 📖 **Building a quality corpus.** See [`docs/USER_CORPUS_GUIDE.md`](docs/USER_CORPUS_GUIDE.md)
> for the corpus-authoring discipline that produced the bundled
> attune-help corpus's 100% / 100% baseline + 100% paraphrased R@3:
> frontmatter aliases, multi-token intent, the `MIN_ALIAS_OVERLAP`
> knob, stemmer traps, the override file pattern, and the
> strict-dominance measurement loop. The guide is the v0 forerunner
> of the v1.0.0 framework framing (`user-corpus-onboarding` spec).

## Hybrid retrieval (optional)

`QueryExpander` and `LLMReranker` require the `[claude]` extra and an
`ANTHROPIC_API_KEY`. Both are opt-in and fail-safe — any API error
falls back to keyword-only order automatically.

```python
from attune_rag import RagPipeline, LLMReranker, QueryExpander

# Reranker only (recommended for precision):
pipeline = RagPipeline(reranker=LLMReranker())

# Expander + reranker (max coverage):
pipeline = RagPipeline(
    expander=QueryExpander(),
    reranker=LLMReranker(),
)
```

## Template editor primitives (`attune_rag.editor`)

Headless toolkit for tools that need to validate, lint, and refactor a
template corpus — used by the [`attune-gui`](https://pypi.org/project/attune-gui/)
template editor and the [`attune-author`](https://pypi.org/project/attune-author/)
`edit` CLI, but works standalone with any
[`CorpusProtocol`](https://github.com/Smart-AI-Memory/attune-rag).

| API | What it does |
|-----|---------------|
| `load_schema()` | Loads `template_schema.json` (the v1 frontmatter contract: required `type` enum + `name`; optional `tags`, `aliases`, `summary`, `source`, `hash`; `additionalProperties: true`). |
| `parse_frontmatter(text)` / `validate_frontmatter(data)` | Split a template into frontmatter + body and report typed `FrontmatterIssue`s — used by linters and editors. |
| `lint_template(text, rel_path, corpus)` | Returns `Diagnostic[]` for schema violations, broken `[[alias]]` references, and depth-marker sequence errors. 1-indexed line/col ranges. |
| `autocomplete_tags(corpus, prefix, limit)` / `autocomplete_aliases(corpus, prefix, limit)` | Prefix-match completions ranked by frequency (tags) or lexical proximity (aliases). Sub-ms on 1k templates. |
| `find_references(corpus, name, kind)` | Locate every alias/tag/path occurrence across body, frontmatter, and `cross_links.json`. |
| `plan_rename(corpus, old, new, kind)` | Build a `RenamePlan` (one `FileEdit` per affected file with unified-diff hunks) for `kind="alias"` or `"tag"`. Raises `RenameCollisionError` on existing alias targets. |
| `apply_rename(corpus, plan)` | Atomically apply the plan (tempfile-per-file + sequential rename + drift-detection rollback). Returns the list of affected paths. |

Schema, lint, and rename are pure functions over `CorpusProtocol` — no I/O,
no global state. All three pieces are tested as a unit and used live by the
attune-gui editor's `/api/corpus/<id>/lint`, `/autocomplete`, and
`/refactor/rename/{preview,apply}` routes.

```python
from attune_rag import DirectoryCorpus
from attune_rag.editor import lint_template, plan_rename, apply_rename

corpus = DirectoryCorpus(Path("./templates")).load()

# Validate a template before saving
diagnostics = lint_template(
    text=Path("./templates/concepts/foo.md").read_text(),
    rel_path="concepts/foo.md",
    corpus=corpus,
)

# Rename an alias across the whole corpus
plan = plan_rename(corpus, old="oldname", new="newname", kind="alias")
print(f"Affects {len(plan.edits)} files")
affected = apply_rename(corpus, plan)
```

## Dashboard

```bash
attune-rag dashboard show    # live terminal dashboard
attune-rag dashboard render --out report.html  # HTML snapshot
```

## Quality baselines

attune-rag locks two baselines, both gated by CI. Thresholds
are empirically derived (`mean ± 2σ`) from back-to-back
benchmark runs on an unchanged HEAD — grounded, not guessed.

### Retrieval + faithfulness

| Metric | Threshold (current) | Source |
|---|---:|---|
| `precision_at_1` | **≥ 0.95** | retrieval, deterministic |
| `recall_at_3` | **= 1.00** | retrieval, deterministic |
| `mean_faithfulness` | **≥ 0.9686** | Claude judge, σ ≈ 0.005 |

Gated by [`.github/workflows/benchmark.yml`](.github/workflows/benchmark.yml).
Faithfulness gating engages when the PR touches retrieval,
reranker, expander, pipeline, prompts, or eval paths, or when
the PR title contains `[full-bench]`. Methodology + raw numbers
in [`docs/specs/release-quality-baseline/`](docs/specs/release-quality-baseline/baseline-1.md).

### Per-hot-path latency

Locked dual-axis (wall-clock + CPU-time) thresholds on the four
benchmarks. CPU-time is the gating axis (deterministic);
wall-clock is advisory.

Numbers measured under the V2 multi-run methodology (5
invocations × 20 runs = 100 measurements per metric) on the
locked-baseline runner (Linux `ubuntu-latest`, CPython 3.11.15).
Inter-run and intra-run variance are tracked separately;
thresholds are `mean + 2σ × inter_run_stdev`. **Full 8-row
dual-axis table + hardware fingerprint + per-metric noise
profile:**
[`docs/specs/downstream-validation/perf-baseline.md`](docs/specs/downstream-validation/perf-baseline.md).

Why two threshold styles in the locked table:

- **`keyword_retriever_retrieve`** has a wider CPU band because
  measured intra-run variance reflects cold-cache effects on the
  first few iterations — empirically derived, not tuned for
  tightness.
- **`llm_reranker_rerank`** is wall-clock-only because Anthropic
  network variance dominates the CPU axis; the gate is set
  generously.

Gated by [`.github/workflows/perf.yml`](.github/workflows/perf.yml)
per-PR (blocking on the CPU axis as of W3.1).

### Why this is the differentiator

Most RAG libraries A/B-test internally and ship the result.
attune-rag publishes the thresholds, gates merges against them,
and re-measures whenever the corpus, judge prompt, or hardware
changes. The receipts are checked in.

## Bundled `.help/` corpus

The repo ships a polished `.help/` corpus that documents
attune-rag's own surface — 143 templates across 13 features ×
11 kinds (`concept`, `task`, `reference`, `quickstart`, `faq`,
`error`, `warning`, `tip`, `note`, `comparison`,
`troubleshooting`). Generated by
[`attune-author`](https://pypi.org/project/attune-author/) with
strict fact-check; queryable via `AttuneHelpCorpus` or as the
bundled default for `RagPipeline()`. See
[`.help/features.yaml`](.help/features.yaml) for the feature
map and [`.help/templates/`](.help/templates/) for the content.

The 13 features: `pipeline`, `retrieval`, `corpus`, `prompts`,
`provenance`, `providers`, `eval`, `benchmark`, `cli`, `editor`,
`dashboard`, `expander`, `reranker`.

### What faithfulness measures

Faithfulness scores how well an answer is **grounded in the retrieved
passages** — `1.0` means every claim in the answer is supported by a
cited source; lower scores mean some claims have no support in the
context. It catches hallucination in a way that `precision_at_k` and
`recall_at_k` can't: those only measure whether the *right documents*
were retrieved, not whether the *generated answer* actually used them.

attune-rag uses **Claude as the judge** via Anthropic's tool-use API
to produce a structured score in `[0.0, 1.0]` for each
`(query, answer, retrieved_context)` triple. The reported metric is
the mean over the golden query set. Aggregate σ ≈ `0.005` over 40
queries even though per-query judge non-determinism can swing 40+
percentage points on individual queries — averaging absorbs the noise.

The same discipline powers `attune-author`'s polish/fact-check
pipeline — generated help content is scored against retrieved
source passages before being marked authoritative. attune-rag's
faithfulness primitives aren't just instrumentation; they're the
contract the family's content-quality story is built on.

### Run faithfulness manually

```bash
pip install 'attune-rag[claude]'
export ANTHROPIC_API_KEY=sk-ant-...

# Retrieval metrics only (free, deterministic):
attune-rag-benchmark --queries queries.yaml --json out.json

# Add faithfulness (~1 Claude API call per query, costs tokens):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --json out.json

# Compare extended-thinking on vs off (2× judge cost):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --compare-thinking --json out.json
```

The judge implementation lives at
`attune_rag.eval.faithfulness.FaithfulnessJudge`. Note: `attune_rag.eval.*`
is currently INTERNAL and may move — the `attune-rag-benchmark
--with-faithfulness` CLI is the stable contract.

For the methodology behind the `0.9686` threshold, the v1/v2 ground-truth
calibration runs, and the extended-thinking-vs-default decision record, see
[`docs/rag/faithfulness-thinking-calibration.md`](https://github.com/Smart-AI-Memory/attune-rag/blob/main/docs/rag/faithfulness-thinking-calibration.md).

## Roadmap — embeddings (post-freeze 0.2.0+)

Keyword retrieval + optional Claude reranker currently meet
the locked `P@1 ≥ 0.95, R@3 = 1.00` thresholds against the
attune-help golden set. The remaining hard queries
(3 of 28, currently `xpass`-gated under `[no-embeddings]`)
have zero token overlap against their target doc (e.g.
"vulnerability scan" → `tool-security-audit.md`). Closing
that gap needs vector search.

The plan is to ship `attune-rag[embeddings]` using
[`fastembed`](https://github.com/qdrant/fastembed) for local,
CPU-only embeddings — no new network dependency, no API key
required at retrieval time. Keyword retrieval stays the default;
embeddings layer in opt-in, same shape as `QueryExpander` and
`LLMReranker`. With 0.2.0 cut, embeddings are a Phase 5
candidate — see
[`docs/specs/ROADMAP-v1.md`](docs/specs/ROADMAP-v1.md).

See
[CHANGELOG.md](https://github.com/Smart-AI-Memory/attune-rag/blob/main/CHANGELOG.md)
for the decision record and remaining-gap analysis.

## Prompt caching (Claude only)

When using the Claude provider, `run_and_generate` automatically enables
[Anthropic prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
on the stable RAG context prefix (≥ 1 024 chars). This eliminates
repeated token costs on the corpus portion of the prompt when the same
context block is reused across calls.

No configuration needed — the provider handles the `cache_control`
header automatically.

## Public API

attune-rag's public surface is documented below and snapshot-tested
in [tests/unit/test_api_surface.py](tests/unit/test_api_surface.py).
Formal SemVer commitments are in effect as of 0.2.0 — see
[docs/POLICY.md](docs/POLICY.md) for the deprecation policy. Symbols
PUBLIC in 0.2.x stay PUBLIC through every 0.2.z; the snapshot test
catches drift.

**Top-level (`from attune_rag import ...`):**

- Pipeline — `RagPipeline`, `RagResult`
- Corpus — `CorpusProtocol`, `RetrievalEntry`, `DirectoryCorpus`,
  `AttuneHelpCorpus`
- Retrieval — `KeywordRetriever`, `RetrievalHit`, `RetrieverProtocol`
- Provenance — `CitationRecord`, `CitedSource`, `ClaimCitation`,
  `format_citations_markdown`, `format_claim_citations_markdown`
- Prompting — `build_augmented_prompt`, `PROMPT_VARIANTS`
- Hybrid retrieval — `QueryExpander`, `LLMReranker`

**PUBLIC submodules** (importable by qualified path):

- `attune_rag.corpus` — exposes `AliasInfo`, `DuplicateAliasError`,
  `load_aliases_from_file` in addition to the top-level corpus names
- `attune_rag.corpus.attune_help` — `AttuneHelpCorpus`
- `attune_rag.corpus.help_adapter` — `HelpCorpusAdapter` Protocol
- `attune_rag.providers` — `LLMProvider`, `get_provider`,
  `list_available`
- `attune_rag.measure_corpus` — `measure(...)` function +
  `MeasureResult` dataclass for scoring a corpus against a query
  set. CLI via `python -m attune_rag.measure_corpus ...` or the
  `attune-rag-measure` console script. See
  [`docs/USER_CORPUS_GUIDE.md`](docs/USER_CORPUS_GUIDE.md) §6 for
  the worked example.
- `attune_rag.editor` — template-editor primitives (lint, schema,
  rename, autocomplete, references); see "Template editor primitives"
  above for the symbol list
- `attune_rag.editor.{rename,schema,lint,autocomplete,references}` —
  the individual editor submodules

**Console scripts:**

- `attune-rag` — CLI entry point (`attune_rag.cli:main`)
- `attune-rag-measure` — quality measurement
  (`attune_rag.measure_corpus:main`); CI-suitable via `--watermark-r3`
  (non-zero exit on fail)

Anything not listed above is INTERNAL and may change in any release.
The underscore-prefixed editor modules (`attune_rag.editor._rename`
etc.) shipped in 0.1.x are deprecation shims as of 0.2.0; they
re-export the new non-underscore names and emit `DeprecationWarning`.
They are removed in 0.3.0.

## Status

**0.2.0 — first SemVer-binding cut.** Phase 4 of the v1.0 roadmap
landed cleanly: quality baselines (P@1 ≥ 0.95, R@3 = 1.00, mean
faithfulness ≥ 0.9686) hold; per-hot-path perf thresholds re-locked
under the V2 multi-run methodology (5 × 20 measurements); attune-gui
downstream blocking gate stayed green throughout. From 0.2.0 forward,
[`docs/POLICY.md`](docs/POLICY.md) §2 binds — symbols PUBLIC in 0.2.x
stay PUBLIC through every 0.2.z.

We hit our Phase 4 goals ~3 weeks ahead of the nominal calendar and
opted to ship early via the freeze-override mechanism rather than let
the cadence clock run out — getting the user-facing additions
(`attune-rag-measure` console script + `attune_rag.measure_corpus`
module for benchmarking your own corpus quality;
`load_aliases_from_file()` for file-based alias customization) into
your hands sooner. Override rationale + per-PR receipts at
[`docs/specs/api-v0.2.0-cut/`](docs/specs/api-v0.2.0-cut/).

Classifier stays at `3 - Alpha` — the Production/Stable flip is a
Phase 5 deliverable.

Part of the attune ecosystem
([attune-ai](https://github.com/Smart-AI-Memory/attune-ai),
[attune-help](https://github.com/Smart-AI-Memory/attune-help),
[attune-author](https://github.com/Smart-AI-Memory/attune-author),
[attune-gui](https://github.com/Smart-AI-Memory/attune-gui)).

## License

Apache 2.0. See
[LICENSE](https://github.com/Smart-AI-Memory/attune-rag/blob/main/LICENSE).
