```
# synix

> A build system for agent memory — declarative pipelines that transform conversations into searchable, hierarchical memory with full provenance tracking

## What it does

Synix processes raw conversation data (ChatGPT exports, Claude exports, text files) through configurable LLM-powered transforms to create multi-altitude memory artifacts. Sources become episodes, episodes become summaries, summaries become core memory — with content-addressed caching so only changed components rebuild. The output is system prompt + RAG with complete lineage from any memory back to its source conversations.

## Key concepts

- **Artifact** — immutable, versioned build output (transcript, episode, rollup, core memory) with SHA256 content addressing
- **Layer** — typed transform in the build DAG (Source for inputs, Transform subclasses for LLM steps, SearchIndex/FlatFile for projections)  
- **Pipeline** — Python-declared build definition with `Pipeline.add(*layers)` routing layers to projections automatically
- **Projection** — materialized output surface (SearchIndex for SQLite FTS5 + embeddings, FlatFile for markdown context docs)
- **Provenance** — every artifact traces back through the dependency graph to source conversations, always included in search results
- **Cache/Rebuild** — fingerprint comparison of inputs + prompts + model config triggers selective rebuilds, otherwise skip

## Installation and quick start

```bash
# Install
uvx synix init my-project
cd my-project

# Configure your API key (see pipeline.py for provider options)
export ANTHROPIC_API_KEY=your-key

# Build the pipeline  
uvx synix build

# Search and explore
uvx synix list                    # all artifacts by layer
uvx synix show final-report       # render specific artifact  
uvx synix search "your query"     # full-text search with provenance
```

## CLI commands

```bash
uvx synix build pipeline.py                          # Build pipeline, materialize projections
uvx synix plan pipeline.py                           # Dry-run showing rebuild/cached decisions  
uvx synix plan pipeline.py --explain-cache           # Plan with cache decision explanations
uvx synix search "query" [--layers episodes,core]    # Search with provenance chains
uvx synix lineage <artifact-id>                      # Show full provenance tree
uvx synix list                                       # All artifacts grouped by layer
uvx synix show <id>                                  # Display artifact content
uvx synix init <name>                                # Scaffold new project
```

## Architecture overview

```
src/synix/
├── __init__.py            # Public API: Pipeline, Source, Transform, SearchIndex, FlatFile, Artifact
├── core/
│   └── models.py          # Layer hierarchy (Source, Transform, SearchIndex, FlatFile, Pipeline)
├── build/
│   ├── runner.py          # Execute pipeline — walk DAG, run transforms, cache artifacts  
│   ├── plan.py            # Dry-run planner — rebuild/cached decisions per artifact
│   ├── dag.py             # DAG resolution — build order from depends_on references
│   ├── artifacts.py       # Artifact storage — save/load/query (filesystem-backed)
│   ├── provenance.py      # Provenance tracking — record and query lineage chains
│   ├── fingerprint.py     # Build fingerprints — synix:transform:v2 cache key scheme
│   ├── llm_transforms.py  # Built-in transforms (EpisodeSummary, MonthlyRollup, etc.)
│   └── transforms.py      # Transform base + registry
├── search/
│   ├── indexer.py         # SQLite FTS5 — build, query, shadow swap
│   ├── embeddings.py      # Embedding provider — fastembed, OpenAI, cached
│   └── retriever.py       # Hybrid search — keyword + semantic + RRF fusion
├── cli/                   # Click CLI commands
└── templates/             # Bundled demo pipelines
```

## Pipeline definition

Pipelines are Python files declaring layers with object references for dependencies:

```python
from synix import Pipeline, Source, SearchIndex
from synix.transforms import EpisodeSummary, MonthlyRollup, CoreSynthesis

pipeline = Pipeline("my-pipeline")
pipeline.llm_config = {"provider": "anthropic", "model": "claude-haiku"}

transcripts = Source("transcripts") 
episodes = EpisodeSummary("episodes", depends_on=[transcripts])
monthly = MonthlyRollup("monthly", depends_on=[episodes]) 
core = CoreSynthesis("core", depends_on=[monthly])

pipeline.add(transcripts, episodes, monthly, core)
pipeline.add(SearchIndex("search", sources=[episodes, monthly]))
```

Built-in transforms: EpisodeSummary (1:1), MonthlyRollup (group by month), TopicalRollup (group by topic), CoreSynthesis (N:1), Merge (similarity-based grouping). Configurable ext transforms: MapSynthesis, GroupSynthesis, ReduceSynthesis, FoldSynthesis.

## Important constraints

- SQLite-only storage (no external databases required)
- Python 3.11+ required  
- LLM API key needed (OpenAI, Anthropic, or OpenAI-compatible)
- No web UI (CLI-based workflow)
- UV-native build system (`uv sync`, `uv run synix`)
- Local filesystem artifact storage
- Single-machine execution (distributed mesh is experimental)
```