# Vincio

> Vincio is a Python platform for context-engineered AI applications. It
> compiles prompts, memory, retrieval, tools, schemas, and policies into
> optimized, validated, observable, provider-neutral context packets, then
> validates and evaluates every output.

Package: `pip install vincio` · Python 3.11+ · Apache 2.0
Main entry point: `from vincio import ContextApp`

## Quickstart

```python
from vincio import ContextApp
app = ContextApp(name="docs_qa")
app.add_source("docs", path="./docs", retrieval="hybrid")
app.set_policy("answer_only_from_sources", True)
result = app.run("How do I configure SSO?")
result.output; result.citations; result.trace_id; result.cost_usd
```

## Docs

- [Getting started](docs/getting-started.md)
- [Context packets & compiler](docs/concepts/context-packets.md)
- [Prompt compiler](docs/concepts/prompt-compiler.md)
- [Memory](docs/concepts/memory.md)
- [Retrieval](docs/concepts/retrieval.md)
- [Agents & workflows](docs/concepts/agents.md)
- [Evaluation](docs/concepts/evals.md)
- [Build a RAG app](docs/guides/build-rag-app.md)
- [Connect data sources](docs/guides/connectors.md)
- [Structured output](docs/guides/structured-output.md)
- [Add tools](docs/guides/add-tools.md)
- [Run evals](docs/guides/run-evals.md)
- [Optimize](docs/guides/optimize-context.md)
- [Performance & streaming](docs/guides/performance.md)
- [API reference](docs/reference/api.md)
- [CLI reference](docs/reference/cli.md)
- [Config reference](docs/reference/config.md)

## Key facts for code generation

- All public data contracts are Pydantic v2 models.
- Async-first: every engine has `arun`/async methods plus sync wrappers.
- Providers: openai, anthropic, google, mistral, local (OpenAI-compatible),
  mock (deterministic, offline; generates schema-valid structured output).
- `ContextApp.run()` executes: normalize → classify → policy → memory →
  retrieve → compile context (score/dedupe/conflict/compress/budget) →
  compile prompt (cache-aware) → model (+bounded tool loop) → validate
  (schema/citations/policy, principled repair) → evaluate → trace → memory write.
- Streaming: `async for event in app.astream("...")` — `stage`, `text_delta`,
  `partial_output` (incremental partial-JSON), `tool_*`, then `done` with the
  full `RunResult`; the server SSE endpoint emits the same events.
- Performance (0.2): bounded concurrent fan-out (`vincio.core.concurrency`),
  content-addressed compile/chunk/embedding caches (on by default),
  request coalescing + pooled provider transport, slim (zero-copy) packets,
  hard `max_latency_ms` deadline + cancellation propagation; VincioBench
  budgets gate CI (`benchmarks/check_budgets.py`).
- Retrieval (0.3): index modes bm25 | dense | sparse (SPLADE-style) |
  late_interaction (ColBERT-style MaxSim, PLAID-style compression) |
  hybrid | hybrid_full | graph | hybrid_graph, all fused by weighted RRF;
  query strategies hyde | multi_query | decompose | step_back
  (engine `query_strategies=` or config `retrieval.query_strategies`);
  chunking adds sentence_window | hierarchical/parent_document (use
  `AutoMergingIndex`) | contextual (+`contextualize_chunks` for LLM
  prefixes); `GraphRAG` (communities + summaries, global/local routing);
  `LiveIndex` (upsert/TTL/freshness) + `VectorIndex.migrate`.
- Connectors: `from vincio.connectors import connect` — web, github, sql,
  s3, gcs, notion, confluence, slack, custom via `register_connector`;
  `app.add_source("kb", connector=connect("web", urls=[...]))`.
- Memory (0.4): `app.remember(content, user_id=...)` / `app.recall(query,
  user_id=...)`; scopes session | user | agent | tenant | organization |
  global; scoped handles `app.memory.for_user("u1").remember(...)`; hybrid
  lexical+vector+graph recall (`memory.hybrid_recall`, on by default);
  consolidation `await app.memory.consolidate(session_id, user_id=...)`
  (episodic→semantic, provenance in `consolidated_from`); hygiene: per-scope
  `memory.ttl_days`, importance-weighted `decay_pass()`, audited
  `edit`/`forget`/`export_owner_data`/`erase_owner_data`; run write-back via
  `memory.write_back: [input, evidence, tools]`; eval harness
  `vincio.memory.evaluate_memory` (recall precision, contradiction rate,
  staleness, personalization lift) gated in VincioBench.
- Output schemas: pass a Pydantic class as `output_schema=`; `result.output`
  is a validated instance.
- Evals: `Dataset.load("golden.jsonl")`, `app.evaluate(...)`,
  gates like `{"groundedness": ">= 0.95"}`; CLI `vincio eval run`.
- CLI: init, run, eval run/report, prompt lint/compile,
  trace show/replay/diff, optimize run, index build, memory
  inspect/remember/recall/forget/export/consolidate/decay.
- Server: `from vincio.server import create_app` (FastAPI; API key + JWT).
