# synix

> A build system for agent memory

## What it does

Synix transforms raw conversations into hierarchical memory structures with full provenance tracking and incremental rebuilds. Define your memory architecture in Python, build it like code, then change it—only affected layers rebuild.

## Key concepts

- **Artifact** — immutable, versioned build output (transcript, episode, rollup, core memory). Content-addressed via SHA256.
- **Layer** — typed Python object in the build DAG. `Source` for inputs, `Transform` subclasses for LLM steps, `SearchIndex`/`FlatFile` for projections.
- **Pipeline** — declared in Python. Dependencies are object references via `depends_on`.
- **Projection** — materializes artifacts into usable outputs. `SearchIndex` (SQLite FTS5 + optional embeddings), `FlatFile` (markdown context doc).
- **Provenance** — every artifact traces back to its inputs, always included in search results.
- **Cache/Rebuild** — fingerprint comparison: if inputs, prompt, or config changed, rebuild. Otherwise skip.

## Installation and quick start

```bash
# Initialize with source data
uvx synix init my-project --from ~/exports/chatgpt.json
cd my-project

# Build the pipeline
uvx synix build

# Search with provenance
uvx synix search "rust programming"
uvx synix list
uvx synix show final-report
```

## CLI commands

```bash
uvx synix init <name>                 # Create new project with sources and pipeline
uvx synix build                       # Run pipeline, only rebuild what changed
uvx synix plan                        # Dry-run showing what would build
uvx synix search <query>              # Full-text search with provenance chains
uvx synix list [layer]               # List all artifacts, optionally by layer
uvx synix show <id>                  # Display artifact content and metadata
uvx synix lineage <id>               # Show full provenance tree
uvx synix validate                   # Run validators against build artifacts
uvx synix clean                      # Delete build directory
```

## Architecture overview

```
src/synix/
├── core/models.py          # Layer hierarchy (Source, Transform, SearchIndex, FlatFile, Pipeline)
├── build/
│   ├── runner.py          # Execute pipeline — walk DAG, run transforms, cache artifacts
│   ├── plan.py            # Dry-run planner — per-artifact rebuild/cached decisions
│   ├── dag.py             # DAG resolution — build order from depends_on references
│   ├── artifacts.py       # Artifact storage — save/load/query (filesystem-backed)
│   ├── provenance.py      # Provenance tracking — record and query lineage chains
│   ├── fingerprint.py     # Build fingerprints — synix:transform:v2 scheme
│   ├── llm_transforms.py  # Built-in LLM transforms (EpisodeSummary, MonthlyRollup, etc.)
│   └── transforms.py      # Transform base + registry
├── search/
│   ├── indexer.py         # SQLite FTS5 — build, query, shadow swap
│   ├── embeddings.py      # Embedding provider — fastembed, OpenAI, cached
│   └── retriever.py       # Hybrid search — keyword + semantic + RRF fusion
└── cli/                   # Click CLI commands
```

## Pipeline definition

```python
from synix import Pipeline, Source, SearchIndex
from synix.ext import MapSynthesis, ReduceSynthesis

pipeline = Pipeline("my-pipeline")
pipeline.source_dir = "./sources"
pipeline.llm_config = {
    "provider": "anthropic",
    "model": "claude-haiku-4-5-20251001",
}

# Parse source files
bios = Source("bios", dir="./sources/bios")

# 1:1 transform
work_styles = MapSynthesis(
    "work_styles",
    depends_on=[bios],
    prompt="Infer this person's work style:\n\n{artifact}",
)

# N:1 synthesis
report = ReduceSynthesis(
    "report",
    depends_on=[work_styles],
    prompt="Write team analysis from these profiles:\n\n{artifacts}",
    label="team-report",
)

pipeline.add(bios, work_styles, report)
pipeline.add(SearchIndex("search", sources=[work_styles, report]))
```

## Important constraints

- **SQLite only** — no external databases, everything is filesystem-backed
- **No web UI** — pure CLI tool focused on build system workflow
- **Python 3.11+** — uses modern Python features for pipeline definition
- **LLM provider required** — needs `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`
- **Content-addressed** — artifacts identified by SHA256, enables reliable caching
- **Fingerprint-based caching** — captures inputs, prompts, model config, and transform source code
- **Fail-fast design** — never silently ignore errors, always surface problems to user