# synix

> A build system for agent memory

## What it does

Synix transforms raw conversations into searchable, hierarchical memory with full provenance tracking. Declarative pipelines define how chat exports become episode summaries, monthly rollups, and core memory artifacts. Change a config, only affected layers rebuild. Think `make` or `dbt`, but for AI agent memory.

## Key concepts

- **Artifact** — immutable, versioned build output (transcript, episode, rollup, core memory). Content-addressed via SHA256.
- **Layer** — typed Python object in the build DAG. `Source` for inputs, `Transform` subclasses for LLM steps, `SearchIndex`/`FlatFile` for projections. Dependencies are object references via `depends_on`.
- **Pipeline** — declared in Python. `Pipeline.add(*layers)` routes Source/Transform to layers, SearchIndex/FlatFile to projections automatically.
- **Projection** — materializes artifacts into usable outputs. `SearchIndex` (SQLite FTS5 + optional embeddings), `FlatFile` (markdown context doc).
- **Provenance** — every artifact traces back to its inputs. Always included in search results.
- **Cache/Rebuild** — hash comparison: if inputs or prompt changed, rebuild. Otherwise skip.

## Installation and quick start

```bash
uvx synix init my-project
cd my-project
uvx synix build
uvx synix search "hiking"
```

## CLI commands

```bash
uvx synix init <name>                    # Scaffold new project with sources, pipeline, README
uvx synix build                          # Run pipeline, only rebuild what changed
uvx synix plan                          # Dry-run showing what would build without running transforms
uvx synix plan --explain-cache          # Plan with cache decision reasons per artifact
uvx synix search <query>                # Full-text search across indexed layers
uvx synix search <query> --mode hybrid  # Semantic + keyword search
uvx synix list [layer]                  # List artifacts, optionally filtered by layer
uvx synix show <id>                     # Display artifact content (resolves by label or ID prefix)
uvx synix lineage <id>                  # Show full provenance chain for an artifact
uvx synix validate                      # Run declared validators (experimental)
uvx synix verify                        # Check build integrity (hashes, provenance)
uvx synix clean                         # Delete build directory
uvx synix batch-build run               # Submit pipeline via OpenAI Batch API (experimental)
```

## Architecture overview

```
src/synix/
├── __init__.py            # Public API: Pipeline, Source, Transform, SearchIndex, FlatFile, Artifact
├── core/
│   └── models.py          # Layer hierarchy (Source, Transform, SearchIndex, FlatFile, Pipeline)
├── build/
│   ├── runner.py          # Execute pipeline — walk DAG, run transforms, cache artifacts
│   ├── plan.py            # Dry-run planner — per-artifact rebuild/cached decisions
│   ├── dag.py             # DAG resolution — build order from depends_on references
│   ├── pipeline.py        # Pipeline loader — import Python module, extract Pipeline object
│   ├── artifacts.py       # Artifact storage — save/load/query (filesystem-backed)
│   ├── provenance.py      # Provenance tracking — record and query lineage chains
│   ├── fingerprint.py     # Build fingerprints — synix:transform:v2 scheme
│   ├── llm_transforms.py  # Built-in LLM transforms (EpisodeSummary, MonthlyRollup, etc.)
│   ├── parse_transform.py # Source parser — ChatGPT/Claude JSON → transcript artifacts
│   ├── merge_transform.py # Merge transform — Jaccard similarity grouping
│   ├── transforms.py      # Transform base + registry (string dispatch fallback)
│   ├── validators.py      # Built-in validators (PII, SemanticConflict, Citation, etc.)
│   ├── fixers.py          # Built-in fixers (SemanticEnrichment, CitationEnrichment)
│   ├── projections.py     # Projection dispatch
│   └── cassette.py        # Record/replay for LLM + embedding calls
├── search/
│   ├── indexer.py         # SQLite FTS5 — build, query, shadow swap
│   ├── embeddings.py      # Embedding provider — fastembed, OpenAI, cached
│   └── retriever.py       # Hybrid search — keyword + semantic + RRF fusion
├── cli/                   # Click CLI commands
└── templates/             # Bundled demo pipelines (synix init, synix demo)
```

## Pipeline definition

Pipelines are Python files that declare your memory architecture:

```python
from synix import Pipeline, Source, SearchIndex, FlatFile
from synix.transforms import EpisodeSummary, MonthlyRollup, CoreSynthesis

pipeline = Pipeline("my-memory")
pipeline.source_dir = "./sources"
pipeline.build_dir = "./build"
pipeline.llm_config = {
    "model": "claude-sonnet-4-20250514",
    "temperature": 0.3,
    "max_tokens": 1024,
}

# Layer 0: auto-detect and parse source files
transcripts = Source("transcripts")

# Layer 1: one summary per conversation
episodes = EpisodeSummary("episodes", depends_on=[transcripts])

# Layer 2: group episodes by month
monthly = MonthlyRollup("monthly", depends_on=[episodes])

# Layer 3: synthesize everything into core memory
core = CoreSynthesis("core", depends_on=[monthly], context_budget=10000)

pipeline.add(transcripts, episodes, monthly, core)

# Projections — how artifacts become usable
pipeline.add(
    SearchIndex("memory-index", sources=[episodes, monthly, core],
                search=["fulltext", "semantic"],
                embedding_config={"provider": "fastembed", "model": "BAAI/bge-small-en-v1.5"}),
    FlatFile("context-doc", sources=[core], output_path="./build/context.md")
)
```

## Important constraints

- **SQLite only** — No external databases. Everything stored in filesystem + SQLite FTS5
- **No web UI** — Terminal-based tool with Rich formatting
- **Python 3.11+** — Uses modern Python features
- **LLM API keys required** — Supports OpenAI, Anthropic, and OpenAI-compatible endpoints
- **Local execution** — No hosted platform, runs on your machine
- **Experimental features** — Validation/fixing workflow and batch-build may change in future releases