Metadata-Version: 2.4
Name: aperion-archivist
Version: 1.2.1
Summary: The Archivist - Documentation enforcement, validation, and synchronization CLI
Project-URL: Homepage, https://github.com/invictustitan2/aperion-doc-index
Project-URL: Documentation, https://github.com/invictustitan2/aperion-doc-index#readme
Project-URL: Repository, https://github.com/invictustitan2/aperion-doc-index
Project-URL: Bug Tracker, https://github.com/invictustitan2/aperion-doc-index/issues
Author: Aperion Team
License-Expression: MIT
License-File: LICENSE
Keywords: cli,documentation,linting,markdown,validation
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Documentation
Classifier: Topic :: Software Development :: Documentation
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: click>=8.1.0
Requires-Dist: httpx-retries>=0.3.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: jinja2>=3.1.0
Requires-Dist: jsonschema>=4.20.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Requires-Dist: tomli>=2.0.0; python_version < '3.11'
Requires-Dist: watchdog>=4.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Description-Content-Type: text/markdown

# The Archivist

> **"Code is ephemeral; Documentation is the contract.
> If the contract is broken, the system is broken."**

The Archivist is a documentation enforcement, validation, and synchronization
CLI tool. It does not suggest; it **enforces**.

## Features

- **Universal Header Validation** - Enforces mandatory YAML frontmatter on all
  markdown documents
- **Link Checking** - Detects broken internal links and anchors
- **Staleness Detection** - Flags documentation that has drifted behind source
  code
- **Document Indexing** - Pushes document vectors to The Cortex for semantic
  search
- **Document Scaffolding** - Generates properly formatted documents with headers

## Installation

```bash
pip install aperion-archivist
```

Or install from source:

```bash
git clone https://github.com/aperion/archivist.git
cd archivist
pip install -e ".[dev]"
```

## Quick Start

### Validate Documentation

```bash
# Check all docs in ./docs directory
archivist check --doc-root docs

# Strict mode (warnings = errors)
archivist check --doc-root docs --strict

# JSON output for CI
archivist check --doc-root docs --format json
```

### Detect Stale Documentation

```bash
# Compare source (src/) against docs (docs/)
archivist stale --source-root src --doc-root docs

# Use content hashing for accurate detection (avoids false positives)
archivist stale --source-root src --doc-root docs --use-hashing
```

### Sync Documentation

```bash
# Dry run - see what would be updated
archivist sync --source-root src --doc-root docs --dry-run

# Actually touch stale docs (marks for review)
archivist sync --source-root src --doc-root docs
```

### Index to The Cortex

```bash
# Parse docs and push to vector store
archivist index --doc-root docs --cortex-url http://localhost:4949

# Incremental indexing (only changed docs)
archivist index --doc-root docs --incremental

# Dry run
archivist index --doc-root docs --dry-run
```

### Fix Missing Headers

```bash
# Auto-add Universal Headers to all docs missing them
archivist fix --doc-root docs --owner "team-alpha"

# Dry run - see what would be fixed
archivist fix --doc-root docs --dry-run
```

### Global Options

```bash
# Verbose output (debug logging)
archivist --verbose check --doc-root docs

# Quiet mode (errors only)
archivist --quiet check --doc-root docs

# JSON log format (for log aggregators)
archivist --log-format json check --doc-root docs
```

### Scaffold New Documents

```bash
# Create a new document with proper header
archivist scaffold docs/new-feature.md \
  --title "New Feature Guide" \
  --owner "team-alpha" \
  --category "guides" \
  --tags "feature,tutorial"
```

## Universal Header Format

Every markdown document **MUST** have a YAML frontmatter block with required
fields:

```yaml
---
title: Document Title
last_updated: 2026-01-15
owner: team-alpha
category: guides        # optional
tags:                   # optional
  - api
  - authentication
status: published       # optional
---

# Document Title

Content starts here...
```

### Required Fields (Default)

| Field          | Description                          |
| -------------- | ------------------------------------ |
| `title`        | Document title                       |
| `last_updated` | Last update date (YYYY-MM-DD format) |
| `owner`        | Team or individual responsible       |

## Configuration

Create `archivist.toml` in your project root:

```toml
[header]
# Required fields for Universal Header
required_fields = ["title", "last_updated", "owner", "category"]

[exclude]
# Directories to skip
patterns = [".venv", "node_modules", ".git", "__pycache__"]

[[mapping]]
# Map source files to their documentation
source_pattern = "src/**/*.py"
doc_pattern = "docs/api/{name}.md"

[[mapping]]
source_pattern = "lib/**/*.ts"
doc_pattern = "docs/lib/{name}.md"

[cortex]
url = "http://localhost:4949"
collection = "documentation"
```

## CI/CD Integration

### GitHub Actions

```yaml
name: Documentation Check

on:
  pull_request:
    paths:
      - 'docs/**'
      - 'src/**'

jobs:
  doc-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install Archivist
        run: pip install aperion-archivist

      - name: Validate Documentation
        run: archivist check --doc-root docs --strict --format json

      - name: Check for Stale Docs
        run: archivist stale --source-root src --doc-root docs
```

### Pre-commit Hook

```yaml
# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: archivist-check
        name: Validate Documentation
        entry: archivist check --doc-root docs
        language: system
        types: [markdown]
        pass_filenames: false
```

### Makefile

```makefile
.PHONY: docs-check docs-stale docs-index

docs-check:
	archivist check --doc-root docs --strict

docs-stale:
	archivist stale --source-root src --doc-root docs

docs-index:
	archivist index --doc-root docs --cortex-url $(CORTEX_URL)

docs: docs-check docs-stale
```

## Exit Codes

The Archivist uses specific exit codes for CI/CD scripting:

| Code | Name              | Meaning                                   |
| ---- | ----------------- | ----------------------------------------- |
| 0    | SUCCESS           | All checks passed                         |
| 1    | VALIDATION_FAILED | One or more documents failed validation   |
| 2    | HEADER_MISSING    | Universal Header missing from documents   |
| 3    | LINK_BROKEN       | Broken links detected                     |
| 4    | SCHEMA_INVALID    | Frontmatter fails JSON Schema validation  |
| 5    | STALE_DETECTED    | Documentation is stale (source newer)     |
| 10   | CONFIG_ERROR      | Configuration file invalid or missing     |
| 11   | PATH_NOT_FOUND    | Specified path does not exist             |
| 20   | CORTEX_ERROR      | Failed to communicate with Cortex         |

### Using Exit Codes in CI

```bash
archivist check --doc-root docs
exit_code=$?

case $exit_code in
  0) echo "All checks passed" ;;
  2) echo "Missing headers - run 'archivist fix'" ;;
  3) echo "Broken links found" ;;
  4) echo "Schema validation failed" ;;
  *) echo "Validation failed with code $exit_code" ;;
esac
```

## Python Library API

The Archivist can be used as a library in addition to CLI:

### High-Level Functions

```python
from pathlib import Path
from aperion_archivist import validate_docs, check_links, detect_staleness

# Validate all documentation
result = validate_docs(Path("docs"))
if not result.passed:
    for issue in result.issues:
        print(f"{issue.file}: {issue.message}")

# Check for broken links
link_result = check_links(Path("docs"))
print(f"Broken links: {link_result.broken_link_count}")

# Detect stale documentation
stale_result = detect_staleness(
    source_root=Path("src"),
    doc_root=Path("docs"),
)
for doc in stale_result.stale_docs:
    print(f"STALE: {doc.doc_path} ({doc.drift_days:.1f} days)")
```

### Cortex Integration (Async-First Architecture)

The Cortex client uses an **async-first** design to prevent blocking the
event loop when used in async contexts (FastAPI, agents, pipelines).

#### Async Usage (Recommended for Integration)

```python
import asyncio
from aperion_archivist.integrations import AsyncCortexClient

async def index_docs():
    async with AsyncCortexClient(
        base_url="http://localhost:4949",
        api_key="your-key",
    ) as client:
        # Health check
        if not await client.health_check():
            raise RuntimeError("Cortex unavailable")

        # Push chunks
        result = await client.push_chunks(chunks)
        print(f"Indexed {result['indexed']} chunks")

        # Search
        results = await client.search("authentication", top_k=5)

asyncio.run(index_docs())
```

#### Shared Client for Connection Pooling

For high-throughput applications, set a shared client at startup:

```python
import httpx
from aperion_archivist.integrations import AsyncCortexClient

# At application startup
client = httpx.AsyncClient(timeout=30.0)
AsyncCortexClient.set_shared_client(client)

# All CortexClient instances now share connections
async def handler():
    cortex = AsyncCortexClient()  # Uses shared client
    await cortex.push_chunks(chunks)
```

#### Sync Usage (CLI/Scripts)

A synchronous wrapper is provided for non-async contexts:

```python
from aperion_archivist.integrations import CortexClient

# Sync context manager
with CortexClient() as client:
    result = client.push_chunks(chunks)
    print(f"Indexed: {result['indexed']}")
```

## Project Structure

```
aperion-doc-index/
├── src/aperion_archivist/
│   ├── core/
│   │   ├── validator.py   # Universal Header enforcement
│   │   ├── linker.py      # Broken link detection
│   │   └── scanner.py     # Staleness detection
│   ├── generation/
│   │   ├── templates.py   # Jinja2 templates
│   │   └── parser.py      # Markdown chunking
│   ├── cli/
│   │   └── main.py        # CLI entry point
│   └── integrations/
│       └── cortex.py      # Vector store client
├── tests/
├── pyproject.toml
├── Dockerfile
└── README.md
```

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Type check
mypy src

# Lint
ruff check src tests
```

## License

MIT
