Metadata-Version: 2.4
Name: testigo-recall
Version: 0.9.1
Summary: AI-powered codebase knowledge base — scan once, query cheap forever. MCP server for Claude Code, Cursor, GitHub Copilot.
Author-email: Adrian Klostermann <adrian.klostermann@gmail.com>
License: MIT
Project-URL: Homepage, https://www.testigo.tech
Project-URL: Source Code, https://www.testigo.tech
Project-URL: VS Code Extension, https://marketplace.visualstudio.com/items?itemName=testigo-recall.testigo-recall
Project-URL: Issues, https://www.testigo.tech/#contact
Keywords: code review,knowledge base,ai,claude,anthropic,mcp,model context protocol,codebase,documentation
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Documentation
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.1
Requires-Dist: fastapi>=0.110
Requires-Dist: uvicorn>=0.27
Requires-Dist: pydantic>=2.5
Requires-Dist: gitpython>=3.1
Requires-Dist: anthropic>=0.73
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Dynamic: license-file

# Testigo Recall

**Scan once, query cheap forever.** AI-powered codebase knowledge base that extracts behaviors, design decisions, and assumptions from your code — then serves them to AI coding agents via MCP.

## How It Works

```
Your Repos                   Knowledge Base Repo          AI Agent
┌──────────┐   scan/merge  ┌──────────────┐   MCP      ┌──────────────┐
│ repo-a   │──────────────>│ repo-a.db    │<───────────│ Claude Code  │
│ repo-b   │  (Claude AI)  │ repo-b.db    │  (query)   │ Cursor       │
│ repo-c   │               │ repo-c.db    │            │ Windsurf     │
└──────────┘               └──────────────┘            └──────────────┘
      │  PR                  GitHub Release               MCP Server
      │  review                stores per-repo DBs         auto-downloads all
      ▼
┌──────────────┐
│ Inline review│
│ comments on  │
│ Files Changed│
└──────────────┘
```

1. **Scan** — A CI pipeline (GitHub Actions or Azure Pipelines) runs `testigo-recall scan` on your codebase. Claude extracts structured facts (behaviors, design, assumptions) with concrete values. A repo summary is auto-generated.
2. **Review** — On every PR, `testigo-recall review` compares the diff against the knowledge base. Findings appear as inline comments on the exact lines in the "Files changed" tab. Clean PRs get no review (silence is correct).
3. **Update** — When a PR merges to main, `testigo-recall analyze` rescans affected modules and publishes the updated DB as a GitHub Release asset.
4. **Query** — AI agents connect via MCP and search the knowledge base instead of reading raw source files. The MCP server automatically downloads all DB files from the release and injects a repo catalog into the agent's context.

## Quick Start

### 1. Install

**CLI tool** — distributed as a private Docker image:

```bash
# Authenticate with the token provided during onboarding
echo "YOUR_TOKEN" | docker login ghcr.io -u testigo-recall --password-stdin

# Pull the latest image
docker pull ghcr.io/testigo-recall/testigo-recall:latest

# Verify
docker run --rm ghcr.io/testigo-recall/testigo-recall:latest --help
```

**MCP server** — install from PyPI:

```bash
pip install testigo-recall-mcp
```

### 2. Add the CI pipeline

> **Azure DevOps?** Skip to the [Azure DevOps Pipelines](#azure-devops-pipelines) section.

**GitHub Actions** — copy `.github/workflows/testigo-recall.yml` to your repository. Three jobs are included:

- **On PR** (`review-pr`) — reviews the diff against the knowledge base, posts inline comments on affected lines. Read-only — never modifies the DB.
- **On merge to main** (`update-knowledge-on-merge`) — rescans modules affected by the merge, publishes the updated DB.
- **On workflow_dispatch** (`full-scan`) — full re-scan of the entire codebase. Only needed for initial setup or after prompt changes.

### 3. Set repository secrets

| Secret | Required | Description |
|--------|----------|-------------|
| `ANTHROPIC_API_KEY` | Yes | Claude API key for AI extraction |
| `GH_PAT` | External repos only | GitHub PAT with `contents:write` access to the central repo |

### 4. Connect AI agents via MCP

**Claude Code** — add to `.mcp.json` in your project root:

```json
{
  "mcpServers": {
    "testigo-recall": {
      "command": "testigo-recall-mcp",
      "env": {
        "TESTIGO_RECALL_REPO": "your-org/your-central-repo",
        "GITHUB_TOKEN": "your-token-here"
      }
    }
  }
}
```

**GitHub Copilot / VS Code** — add to `.vscode/settings.json` (requires VS Code 1.99+):

```json
{
  "mcp": {
    "servers": {
      "testigo-recall": {
        "command": "testigo-recall-mcp",
        "env": {
          "TESTIGO_RECALL_REPO": "your-org/your-central-repo",
          "GITHUB_TOKEN": "your-token-here"
        }
      }
    }
  }
}
```

**Cursor / Windsurf** — use the same `.mcp.json` format as Claude Code.

The MCP server automatically downloads the latest DB from the GitHub Release on startup. No manual setup needed.

**Environment variables:**

| Variable | Description |
|----------|-------------|
| `TESTIGO_RECALL_REPO` | GitHub repo with the knowledge base release (e.g. `owner/repo`) |
| `GITHUB_TOKEN` | Token for private repos (public repos work without it) |
| `TESTIGO_RECALL_AZURE_URL` | Azure Blob Storage URL (e.g. `https://account.blob.core.windows.net/container`) |
| `TESTIGO_RECALL_DB_PATH` | Override: use a local DB file instead of downloading |

## Security

The extraction prompt explicitly blocks secrets, credentials, API keys, passwords, tokens, connection strings, and any value that looks like a secret. Facts describe *where* secrets come from and *how* they're used, never the actual values.

If you're scanning repos with hardcoded secrets (pipeline configs, Helm values), the knowledge base will document the pattern ("uses Azure Storage key from pipeline variable") without leaking the key itself.

## PR Code Review

The `review` command compares a PR diff against the knowledge base and posts inline comments directly on the affected lines in the "Files changed" tab.

### How it works

The reviewer uses a **three-model pipeline** that automatically scales to PR size:

**Small PRs (≤10 files)** — single-pass review:

1. **Haiku** extracts search keywords from the diff (fast, cheap)
2. Keywords + source file paths are searched against the KB via FTS5/BM25
3. **Sonnet** reviews the diff against all matched facts
4. **Haiku** validates findings to filter false positives

**Large PRs (>10 files)** — chunked review:

1. Files are grouped by directory (~8 files per group)
2. Each group gets its **own** Haiku keyword extraction and targeted fact search
3. Each group is reviewed by **Sonnet** independently with focused context
4. Findings are deduplicated across groups
5. **Haiku** validates all findings in one pass

Chunked review prevents information overload — instead of asking one model call to review 64 files against 800+ facts, each group gets ~150 targeted facts with full context about which other files are also changing in the PR. This catches cross-module issues (like missing feature flag guards on new scopes) that get lost in a single massive call.

Both paths produce the same output format — two sections:
- **KB Conflicts** — where the PR contradicts documented facts (with quoted evidence)
- **Code Review** — bugs, logic errors, and security issues independent of the KB

### Resolution tracking

When a developer pushes fixes, the reviewer automatically tracks which findings were addressed:

- **Fixed findings** get a **Resolved** marker with strikethrough on the original comment
- **Unfixed findings** are reposted fresh at their updated line positions
- **Old review bodies** are marked as superseded with a count of resolved issues
- The new review summary includes a resolution count (e.g. `Resolved: 2`)

This gives clear visibility into progress without losing context on what was fixed.

### Severity levels

| Icon | Level | Meaning |
|------|-------|---------|
| :no_entry: | critical | Will break production or create a security hole |
| :warning: | warning | Incorrect logic or inconsistency that needs review |
| :information_source: | info | Suspicious pattern that may be intentional |

### What it doesn't flag

- Subjective design choices (naming, ordering, ceil vs floor)
- Absent documentation ("KB doesn't mention X" is not a conflict)
- Intentional feature removals (whole-function deletions are not contradictions)
- Style, formatting, missing tests, performance suggestions

Clean PRs with zero findings produce **no review at all** — silence is correct.

## Architecture

### Extraction

**Claude Sonnet 4.5** reads source files module-by-module. Two scan modes are available:

- **Real-time** (default) — 10 parallel workers, results stream in as each module completes. Uses prompt caching for ~90% input token savings on the system prompt.
- **Batch** (`--batch`) — submits all modules to the Anthropic Batch API, polls until complete. **50% cost savings** but async (typically 10-30 minutes). Best for large repos or initial full scans.

Both modes use structured outputs (JSON schema) to guarantee valid extraction responses. Facts are saved to the DB immediately per module.

Large modules are automatically split: >8 files or >60K chars triggers adaptive splitting by directory structure, with files >15KB isolated into their own units.

### Fact Categories

Each extracted fact has a category:

- **behavior** — What the code does: triggers, outcomes, data flows, error handling
- **design** — How it's built: architecture choices, patterns, protocols, config
- **assumption** — What it expects: required inputs, environment, scope boundaries

### Database

Facts are stored in a SQLite database (`{repo-name}.db`) with FTS5 full-text search. BM25 ranking weights: summary (10x), detail (5x), symbols (15x). Each fact includes:

- `category` — behavior, design, or assumption
- `summary` — short description
- `detail` — full explanation with concrete values
- `confidence` — 0.0 to 1.0
- `source_files` — which files this fact was extracted from
- `symbols` — function/class names for grep-based code navigation
- `pr_id` — module ID (`SCAN:path/to/module`)
- `repo` — repository name

### How Data Updates Work

- **Full scan** replaces all `SCAN:*` facts for the scanned repo, module by module
- **Merge analysis** identifies affected modules from the diff, then rescans those modules completely — same quality as a full scan but only for the touched modules. Facts are saved as `SCAN:*`, keeping the DB as a single source of truth.
- **PR review** is read-only — it queries the KB but never modifies it
- Facts from other repos are never touched
- The DB always reflects the current state of the code — no stale or accumulating facts

## Workflow Setup

Testigo Recall provides **reusable workflows** so any repo can be onboarded with a minimal workflow file. The reusable workflows live in this repository under `.github/workflows/`:

| Reusable Workflow | Trigger | Description |
|---|---|---|
| `reusable-review.yml` | PR opened/updated | Reviews diff against KB, posts inline comments with resolution tracking |
| `reusable-update-kb.yml` | Push to main | Rescans affected modules, publishes updated DB |
| `reusable-full-scan.yml` | Manual dispatch | Full re-scan of the entire codebase |

### Adding Testigo Recall to any repo

Create `.github/workflows/testigo-recall.yml` in your repository:

```yaml
name: Testigo Recall

on:
  pull_request:
    types: [opened, synchronize]
  push:
    branches: [main]
  workflow_dispatch:

permissions:
  contents: write
  pull-requests: write

jobs:
  review-pr:
    if: github.event_name == 'pull_request'
    uses: testigo-recall/testigo-recall/.github/workflows/reusable-review.yml@main
    with:
      db-file: my-app.db          # must match the DB name used during scan
      repo-name: my-app           # must match --repo-name used during scan
    secrets:
      GH_PAT: ${{ secrets.GH_PAT }}
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

  update-kb:
    if: github.event_name == 'push'
    uses: testigo-recall/testigo-recall/.github/workflows/reusable-update-kb.yml@main
    with:
      db-file: my-app.db
      repo-name: my-app
    secrets:
      GH_PAT: ${{ secrets.GH_PAT }}
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

  full-scan:
    if: github.event_name == 'workflow_dispatch'
    uses: testigo-recall/testigo-recall/.github/workflows/reusable-full-scan.yml@main
    with:
      db-file: my-app.db
      repo-name: my-app
    secrets:
      GH_PAT: ${{ secrets.GH_PAT }}
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
```

That's it — ~40 lines replaces the full 300+ line workflow. All the review logic, resolution tracking, KB management, and comment posting is handled by the reusable workflows.

### Reusable workflow inputs

| Input | Required | Default | Description |
|---|---|---|---|
| `db-file` | Yes | — | Knowledge base filename (e.g. `my-app.db`) |
| `repo-name` | Yes | — | Repo name as stored in the KB |
| `central-repo` | No | `your-org/knowledge-base` | GitHub repo that stores all `.db` files as release assets |

### Private repo access

If `testigo-recall` is a **private repo**, you need to allow other repos under the same owner to call its reusable workflows:

```bash
gh api repos/testigo-recall/testigo-recall/actions/permissions/access \
  -X PUT -f access_level=user
```

This sets Actions access to "user" level — any repo under the same GitHub owner/org can call the reusable workflows. Only needed once.

### Central repo (testigo-recall itself)

The testigo-recall repo uses its own inline workflow (`.github/workflows/testigo-recall.yml`) since it installs from source (`pip install -e .`) rather than from git. This workflow includes the same resolution tracking and review features as the reusable workflows.

## Cross-Repo Dependencies

The `deps` command extracts inter-repo dependency edges from package manifests and source imports across 7 ecosystems — no AI calls, pure static analysis. Also refreshed automatically during `scan` and `analyze`.

### What it detects

| Ecosystem | Manifests | Source Imports | Example |
|-----------|-----------|---------------|---------|
| Go | `go.mod` | n/a | `go-packages` via Azure DevOps `/_git/` paths |
| JS/TS/Vue | `package.json` | `.js/.ts/.tsx/.jsx/.vue/.mjs/.cjs` | `@acme/shared` via scoped packages |
| Python | `pyproject.toml`, `requirements.txt`, `Pipfile` | `.py` | `my-lib` via package name |
| Java/Kotlin | `pom.xml`, `build.gradle(.kts)` | `.java/.kt/.kts` | `com.acme:core` via groupId |
| C#/.NET | `*.csproj`, `Directory.Packages.props`, `packages.config` | `.cs` | `Acme.Shared` via PackageReference |
| PHP | `composer.json` | n/a | `acme/core` via require |

### How scope mapping works

Cross-repo detection needs a **scope map** — a mapping from package name prefix to target repo name. This tells the scanner "if a dependency starts with `@acme/`, it belongs to the `acme-monorepo` repo."

1. **Auto-discovery (`--multi` mode)** — reads `package.json` `name` fields and Maven `groupId` across all repos. If a repo has a scoped name like `@acme/core` and a `workspaces` field, it becomes the target for that scope. Zero config needed.

2. **Explicit `--scope-map`** — pass one or more mappings manually. Works for all ecosystems: `@acme/:acme-monorepo`, `com.acme:acme-java`, `Acme.:acme-nuget`, `acme/:acme-php`, `my-lib:my-lib-repo`.

3. **Go deps need no config** — `go.mod` is self-describing. Azure DevOps `/_git/` paths are parsed automatically.

### Usage examples

```bash
# Single repo — Go deps auto-detected, JS needs --scope-map
testigo-recall deps --repo /path/to/go-service
testigo-recall deps --repo /path/to/js-app --scope-map "@acme/:acme-monorepo"

# Python, Java, .NET, PHP — all use --scope-map
testigo-recall deps --repo /path/to/python-app --scope-map "my-lib:my-lib-repo"
testigo-recall deps --repo /path/to/java-app --scope-map "com.acme:acme-java"
testigo-recall deps --repo /path/to/dotnet-app --scope-map "Acme.:acme-nuget"

# Multiple repos — auto-discovers scopes from package.json names and Maven groupIds
testigo-recall deps --repo /path/to/all-repos --multi

# Multiple scope maps (mix ecosystems)
testigo-recall deps --repo /path/to/repos --multi \
  --scope-map "@acme/:acme-monorepo" \
  --scope-map "com.acme:acme-java" \
  --scope-map "Acme.:acme-nuget"
```

### Auto-discovery behavior

When using `--multi` without `--scope-map`:

| Scenario | Result |
|----------|--------|
| One repo has `@scope/name` in package.json | Scope auto-mapped to that repo |
| Multiple repos share scope, one has `workspaces` | Monorepo (with workspaces) wins |
| Multiple repos share scope, none has `workspaces` | Warning — use `--scope-map` |
| No scoped packages found | Go deps still detected via go.mod |

## Azure DevOps Pipelines

Testigo Recall also supports Azure Pipelines as an alternative to GitHub Actions. Copy `azure-pipelines-template.yml` to your repo root as `azure-pipelines.yml`.

### Pipeline stages

| Stage | Trigger | Description |
|-------|---------|-------------|
| `review_pr` | Pull Request | Reviews diff against KB, posts inline thread comments on the PR |
| `update_kb` | Merge to main | Rescans affected modules, uploads updated DB to Azure Blob Storage |
| `full_scan` | Manual run | Full re-scan of the entire codebase |

### Step-by-step setup

#### 1. Create Azure Blob Storage

1. Create a **Storage Account** in Azure Portal (or reuse an existing one)
2. Create a **container** (e.g. `knowledge-base`) — this stores the `.db` files

#### 2. Create Azure Service Connection in ADO

The pipeline uses `AzureCLI@2` tasks to read/write blobs. This requires a service connection.

1. Go to **Azure DevOps** → your project → **Project Settings** (gear icon, bottom-left)
2. Under **Pipelines** → **Service connections** → **New service connection**
3. Select **Azure Resource Manager**
4. Configure:
   - **Identity type**: App registration (automatic)
   - **Credential**: Workload identity federation (recommended — no secrets to rotate)
   - **Scope level**: Subscription
   - **Subscription**: Select your Azure subscription
   - **Resource group**: Select the one containing your storage account (or leave empty for subscription-wide access)
   - **Service connection name**: e.g. `testigo-recall-blob`
5. Check **Grant access permission to all pipelines**
6. Click **Save**

#### 3. Assign RBAC role to the service connection

The service connection's identity needs permission to read/write blobs. Run this in Azure CLI (or Azure Portal → Storage account → Access Control → Add role assignment):

```bash
# Find the service principal created by the service connection
az ad app list --all --query "[?contains(displayName, 'YourOrg-YourProject')].{appId: appId, displayName: displayName}" -o table

# Get its object ID
az ad sp show --id <appId> --query id -o tsv

# Assign Storage Blob Data Contributor
az role assignment create \
  --assignee-object-id <objectId> \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Contributor" \
  --scope "/subscriptions/<subId>/resourceGroups/<rgName>/providers/Microsoft.Storage/storageAccounts/<accountName>"
```

Note: RBAC role assignments can take **5-10 minutes** to propagate.

#### 4. Set pipeline variables

Go to **Pipelines** → select your pipeline → **Edit** → **Variables** → add:

| Variable | Secret | Description |
|----------|--------|-------------|
| `ANTHROPIC_API_KEY` | Yes | Claude API key for AI extraction |
| `GHCR_TOKEN` | Yes | GitHub PAT with `read:packages` scope (provided during onboarding) |
| `TESTIGO_STORAGE_ACCOUNT` | No | Azure Storage account name (e.g. `testigorecallkb`) |
| `TESTIGO_CONTAINER` | No | Blob container name (e.g. `knowledge-base`) |
| `TESTIGO_AZURE_SERVICE_CONNECTION` | No | Name of the service connection from step 2 (e.g. `testigo-recall-blob`) |

#### 5. Enable CI trigger

By default, YAML `trigger:` sections may be **overridden** by the pipeline definition settings. To ensure merges to main trigger the `update_kb` stage:

1. Go to **Pipelines** → select your pipeline → **Edit** → **Triggers**
2. Under **Continuous integration**, check **Override the YAML continuous integration trigger**
3. Add branch filter: `+ main` (and `+ master` if needed)
4. Save

Without this, pushes to main won't trigger the pipeline and the KB won't auto-update.

#### 6. Add build validation policy (PR trigger)

The YAML `pr:` trigger alone **does not** auto-trigger builds for pull requests in Azure DevOps. You must add a **build validation branch policy**:

1. Go to **Project Settings** → **Repos** → **Policies**
2. Select the `main` branch (under **Branch Policies**)
3. Under **Build Validation**, click **+** (Add build policy)
4. Configure:
   - **Build pipeline**: Select your Testigo Recall pipeline
   - **Trigger**: Automatic
   - **Policy requirement**: Optional (non-blocking) — the review posts comments but shouldn't block merges
   - **Build expiration**: Immediately when `main` is updated
   - **Display name**: `Testigo Recall - PR Review`
5. Click **Save**

Without this policy, PRs will **not** trigger the `review_pr` stage automatically.

#### 7. Grant Build Service permissions

For the `review_pr` stage to post inline comments on PRs:

1. Go to **Project Settings** → **Repos** → **Security**
2. Find `{Project} Build Service ({org})` in the users list
3. Set **Contribute to pull requests** = **Allow**

#### 8. First run — permission prompt

The first time the pipeline uses the service connection, ADO will show a **permission prompt** on the build page. Click **Permit** to authorize the pipeline to use the service connection. This only happens once.

#### 9. Customize `azure-pipelines.yml`

Copy the template and adjust these values for your repo:

```yaml
variables:
  DB_FILE: "your-repo-name.db"          # must match --repo-name
  TESTIGO_STORAGE_ACCOUNT: "youraccount" # your storage account
  TESTIGO_CONTAINER: "knowledge-base"    # your container name
```

The template uses Docker to pull and run `testigo-recall` — no Python setup required on the build agent. The `GHCR_TOKEN` pipeline variable authenticates against `ghcr.io`.

### Multi-repo setup

Multiple repos can share the same Azure Blob Storage container. Each repo gets its own `.db` file named after the repo (`$(Build.Repository.Name).db`). To add a new repo:

1. Create the repo in your ADO project
2. Copy `azure-pipelines.yml` into the repo (the template works as-is — `$(Build.Repository.Name)` auto-resolves)
3. Create a pipeline definition pointing to that YAML
4. Set the same pipeline variables (or link a shared Variable Group)
5. Enable CI trigger in pipeline settings
6. Run a manual build to trigger the initial full scan

The MCP server auto-discovers all `.db` files in the container and merges them into a single knowledge base. Developers see facts from all repos with a single MCP config:

```json
{
  "mcpServers": {
    "testigo-recall": {
      "command": "testigo-recall-mcp",
      "env": {
        "TESTIGO_RECALL_AZURE_URL": "https://<account>.blob.core.windows.net/<container>"
      }
    }
  }
}
```

Use `repo_name` parameter in `search_codebase` to scope queries to a specific repo.

### Developer MCP access

Developers need read access to the blob container to use the MCP server locally. Two options:

**Option A: Azure AD group (recommended for teams)**
1. Create an AD security group (e.g. "Testigo Recall Users")
2. Assign **Storage Blob Data Reader** on the storage account to the group
3. Add developers to the group — new devs just get added, no per-user RBAC

**Option B: Public container (simplest)**
- Set the container access level to **Blob (anonymous read)** in Azure Portal
- The `.db` files contain architectural summaries, not source code or secrets
- Zero auth needed for developers — MCP just works with the URL

With Option A, developers must have an active `az login` session. With Option B, no auth is needed. Use the MCP config from the multi-repo section above.

Note: On Windows, `az` is a `.cmd` wrapper — MCP server v0.5.1+ resolves the full path automatically via `shutil.which("az")`.

### Key differences from GitHub Actions

| GitHub Actions | Azure Pipelines |
|---|---|
| `actions/checkout@v4` | `checkout: self` (built-in) |
| `gh release download/upload` | `az storage blob download/upload` via `AzureCLI@2` task |
| `${{ secrets.X }}` | `$(X)` as pipeline variable |
| `github.token` | `$(System.AccessToken)` — built-in, no PAT needed |
| Single review with inline comments | Independent thread per comment |
| GitHub PAT for cross-repo access | Azure service connection (workload identity federation, no secrets) |
| CI trigger always on | CI trigger may need explicit enable in pipeline settings |
| PR trigger via `on: pull_request` | PR trigger requires **build validation branch policy** (YAML `pr:` alone is not enough) |

PR comments use the built-in `$(System.AccessToken)` — no PAT required. Blob storage auth uses the Azure service connection via `AzureCLI@2` tasks — no SAS tokens or manual `az login` needed in the pipeline.

## CLI Reference

> **Note:** The `--db` flag is a **global option** that must come **before** the subcommand.
> Correct: `testigo-recall --db my-app.db review ...`
> Wrong: `testigo-recall review --db my-app.db ...`

> **Docker usage:** If running via Docker, prefix commands with:
> `docker run --rm -v "$PWD:/repo" -w /repo ghcr.io/testigo-recall/testigo-recall:latest`

```bash
# Full codebase scan (real-time, 10 parallel workers)
# Generates repo summary and refreshes cross-repo deps automatically
testigo-recall scan --repo /path/to/repo --repo-name my-app

# Full codebase scan (batch mode, 50% cheaper)
testigo-recall scan --repo /path/to/repo --repo-name my-app --batch

# Analyze a merge (rescans affected modules, updates DB)
testigo-recall analyze --repo . --base origin/main --head HEAD --pr PR-42

# Review a PR against the knowledge base (read-only, outputs JSON)
testigo-recall review --repo . --base origin/main --head HEAD --pr PR-42 --repo-name my-app

# Use a specific database file (--db BEFORE the subcommand)
testigo-recall --db /path/to/my-app.db review --repo . --base origin/main --head HEAD --pr PR-42
testigo-recall --db /path/to/my-app.db scan --repo /path/to/repo --repo-name my-app
testigo-recall --db /path/to/my-app.db analyze --repo . --base origin/main --head HEAD --pr PR-42

# Extract cross-repo dependencies (15+ manifest types, 7 languages)
testigo-recall deps --repo /path/to/repos --multi
testigo-recall deps --repo /path/to/repo --scope-map "@acme/:acme-monorepo"

# Query the knowledge base
testigo-recall query search "authentication"
testigo-recall query search "session" --category design
testigo-recall query module "SCAN:backend/app/api"
testigo-recall query component "api_service.py"
testigo-recall query recent --category behavior --limit 10

# Start REST API server
testigo-recall serve --port 8000
```

## MCP Tools Available to AI Agents

When connected via MCP, agents get these tools automatically:

| Tool | Description |
|------|-------------|
| `search_codebase` | Full-text search across all facts. Supports semicolon-separated multi-query (e.g. `"payments; checkout; stripe"`) for batch search in one call |
| `get_module_facts` | Get all facts for a specific module |
| `get_recent_changes` | Get the most recently extracted facts |
| `get_component_impact` | Find all modules where a component appears |
| `list_modules` | List all scanned modules (compact summary without args, full list with `repo_name`) |
| `get_repo_dependencies` | Cross-repo dependency graph from package manifests |

The MCP server includes:
- **Multi-query batching** — semicolon-separated queries run as one call, deduplicated, reducing token usage by ~91%
- **Near-duplicate dedup** — collapses per-country/locale config repetition (e.g. same setting extracted for CZ/SK/PL/RO/IT)
- **Field stripping** — removes noise fields (source, timestamp, relevance) from responses to save tokens
- **Agent instructions** — injects search strategy and tool usage guidance into the agent's system prompt

## Quality

Validated against 85 golden truth facts from a production e-commerce codebase (8000+ facts across 20 repos):

| Metric | Score |
|--------|-------|
| Fact accuracy | 94.1% strict, 100% lenient |
| Hallucination rate | 0% (zero fabricated facts) |
| Source file attribution | 98.8% correct |
| Cross-repo isolation | 100% (no data leakage) |

## Cost

Extraction uses Claude Sonnet 4.5. Approximate costs per scan:

| Repo Size | Real-time | Batch (`--batch`) |
|-----------|-----------|-------------------|
| ~50 files | ~$0.30 | ~$0.15 |
| ~200 files | ~$1.00 | ~$0.50 |
| ~500 files | ~$2.50 | ~$1.25 |
| ~2000 files | ~$10.00 | ~$5.00 |

Real-time mode benefits from prompt caching (~90% discount on repeated system prompt tokens). Batch mode gives a flat 50% discount on all tokens. For large initial scans, batch mode is recommended.

PR review costs scale with PR size:

| PR Size | Approach | Approx Cost |
|---------|----------|-------------|
| 1-10 files | Single call | ~$0.10-0.30 |
| 10-30 files | Chunked (~4 groups) | ~$0.20-0.40 |
| 30-60+ files | Chunked (~8 groups) | ~$0.40-0.60 |

Querying via MCP costs nothing — it's just SQLite lookups.
