# Vincio

> Vincio is a Python platform for context-engineered AI applications. It
> compiles prompts, memory, retrieval, tools, schemas, and policies into
> optimized, validated, observable, provider-neutral context packets, then
> validates and evaluates every output.

Package: `pip install vincio` · Python 3.11+ · Apache 2.0
Main entry point: `from vincio import ContextApp`

## Quickstart

```python
from vincio import ContextApp
app = ContextApp(name="docs_qa")
app.add_source("docs", path="./docs", retrieval="hybrid")
app.set_policy("answer_only_from_sources", True)
result = app.run("How do I configure SSO?")
result.output; result.citations; result.trace_id; result.cost_usd
```

## Docs

- [Getting started](docs/getting-started.md)
- [Context packets & compiler](docs/concepts/context-packets.md)
- [Prompt compiler](docs/concepts/prompt-compiler.md)
- [Memory](docs/concepts/memory.md)
- [Retrieval](docs/concepts/retrieval.md)
- [Agents & workflows](docs/concepts/agents.md)
- [Evaluation](docs/concepts/evals.md)
- [Observability](docs/concepts/observability.md)
- [Build a RAG app](docs/guides/build-rag-app.md)
- [Connect data sources](docs/guides/connectors.md)
- [Structured output](docs/guides/structured-output.md)
- [Reliability & guardrails](docs/guides/reliability-guardrails.md)
- [Add tools](docs/guides/add-tools.md)
- [Orchestrate multi-agent systems](docs/guides/orchestrate-agents.md)
- [Run evals](docs/guides/run-evals.md)
- [Test LLM apps with pytest](docs/guides/test-llm-apps.md)
- [Optimize](docs/guides/optimize-context.md)
- [Close the loop](docs/guides/close-the-loop.md)
- [Performance & streaming](docs/guides/performance.md)
- [Integrations: providers, vector stores, frameworks](docs/guides/integrations.md)
- [Coming from LangChain](docs/guides/migrate-from-langchain.md)
- [Coming from LlamaIndex](docs/guides/migrate-from-llamaindex.md)
- [Coming from Ragas](docs/guides/migrate-from-ragas.md)
- [Coming from Mem0](docs/guides/migrate-from-mem0.md)
- [API reference](docs/reference/api.md)
- [CLI reference](docs/reference/cli.md)
- [Config reference](docs/reference/config.md)

## Key facts for code generation

- All public data contracts are Pydantic v2 models.
- Async-first: every engine has `arun`/async methods plus sync wrappers.
- Providers: openai, anthropic, google, mistral, local (OpenAI-compatible),
  mock (deterministic, offline; generates schema-valid structured output);
  plus OpenAI-compatible presets groq | together | fireworks | openrouter |
  deepseek | perplexity | xai | nvidia (0.9) via `openai_compatible(name)` /
  `build_provider(name)` (`<NAME>_API_KEY` env) or any endpoint with
  `openai_compatible(base_url=..., api_key=...)`.
- `ContextApp.run()` executes: normalize → classify → policy → memory →
  retrieve → compile context (score/dedupe/conflict/compress/budget) →
  compile prompt (cache-aware) → model (+bounded tool loop) → validate
  (schema/citations/policy, principled repair) → evaluate → trace → memory write.
- Streaming: `async for event in app.astream("...")` — `stage`, `text_delta`,
  `partial_output` (incremental partial-JSON), `tool_*`, then `done` with the
  full `RunResult`; the server SSE endpoint emits the same events.
- Performance (0.2): bounded concurrent fan-out (`vincio.core.concurrency`),
  content-addressed compile/chunk/embedding caches (on by default),
  request coalescing + pooled provider transport, slim (zero-copy) packets,
  hard `max_latency_ms` deadline + cancellation propagation; VincioBench
  budgets gate CI (`benchmarks/check_budgets.py`).
- Retrieval (0.3): index modes bm25 | dense | sparse (SPLADE-style) |
  late_interaction (ColBERT-style MaxSim, PLAID-style compression) |
  hybrid | hybrid_full | graph | hybrid_graph, all fused by weighted RRF;
  query strategies hyde | multi_query | decompose | step_back
  (engine `query_strategies=` or config `retrieval.query_strategies`);
  chunking adds sentence_window | hierarchical/parent_document (use
  `AutoMergingIndex`) | contextual (+`contextualize_chunks` for LLM
  prefixes); `GraphRAG` (communities + summaries, global/local routing);
  `LiveIndex` (upsert/TTL/freshness) + `VectorIndex.migrate`.
- Connectors: `from vincio.connectors import connect` — web, github, sql,
  s3, gcs, notion, confluence, slack, custom via `register_connector`;
  `app.add_source("kb", connector=connect("web", urls=[...]))`.
- Memory (0.4): `app.remember(content, user_id=...)` / `app.recall(query,
  user_id=...)`; scopes session | user | agent | tenant | organization |
  global; scoped handles `app.memory.for_user("u1").remember(...)`; hybrid
  lexical+vector+graph recall (`memory.hybrid_recall`, on by default);
  consolidation `await app.memory.consolidate(session_id, user_id=...)`
  (episodic→semantic, provenance in `consolidated_from`); hygiene: per-scope
  `memory.ttl_days`, importance-weighted `decay_pass()`, audited
  `edit`/`forget`/`export_owner_data`/`erase_owner_data`; run write-back via
  `memory.write_back: [input, evidence, tools]`; eval harness
  `vincio.memory.evaluate_memory` (recall precision, contradiction rate,
  staleness, personalization lift) gated in VincioBench.
- Agents & orchestration (0.6): `app.agent(tools=..., planner="dag|dynamic|
  react|direct", max_steps=...)` → bounded `AgentState` with `metrics()`;
  crews `app.crew(members=[{"name", "description", "goal", "keywords",
  "budget_fraction", ...}], process="sequential|parallel|hierarchical",
  max_rounds=...)` → `CrewResult` (output, reports, delegations, blackboard
  snapshot) — members share a versioned `Blackboard`, run under scaled
  budget shares, and always terminate; durable graphs `app.graph(name)` →
  `StateGraph.add_node(name, fn)` (fn: dict state → dict updates) /
  `add_edge` / `add_conditional_edge(source, router, targets=)` /
  `compile(interrupt_before=, interrupt_after=, max_steps=)`; checkpoints
  persist per step in the app metadata store; `invoke`/`ainvoke` returns
  `GraphResult(status="done|interrupted|max_steps", state, thread_id)`;
  `resume(thread_id, value=...)` (a node-level `interrupt(state, payload)`
  re-runs and receives the value), `update_state(thread_id, values)` for
  edit-and-resume, `history(thread_id)` + `fork(checkpoint_id)` for
  time-travel replay; composition `compose(a) | b | c` (functions, agents,
  crews, workflows, graphs; results normalized), `parallel(name=step, ...)`,
  `branch(router, routes)`, streaming via `pipeline.astream(value)` →
  NodeEvents; workflow approval gates with no `approval_fn` pause
  (`status == "paused"`, `pending_approvals`) and
  `workflow.resume(result, approvals={"step": True})` continues without
  re-running done steps; backends `LangGraphBackend().compile(graph)` and
  `OpenAIAgentsBackend().export_crew(crew)` export to those runtimes
  (lazy imports, no lock-in).
- Output schemas: pass a Pydantic class as `output_schema=`; `result.output`
  is a validated instance.
- Structured output & reliability (0.7): provider-native constrained
  decoding negotiated per run (`vincio.output.to_strict_json_schema`
  strict-sanitizes the schema for the decoder; validation runs on the
  original; the mode lands on the trace as `decoding=native|prompt`);
  grammar-style `choice_schema(options)` / `regex_schema(pattern)`;
  streaming validation — `partial_output` events carry `valid_prefix` /
  `validation_errors` (abort early when False); typed signatures —
  `class Sig(Signature)` with `InputField`/`OutputField` (docstring =
  instruction) or `signature("question, context -> answer, confidence:
  float")`, executed via `Predict(sig, provider=, model=)` or
  `app.predictor(sig)(**inputs)` (typed result), optimizer target via
  `sig.to_prompt_spec()`; rails — `app.add_rail(name=, kind="topic|format|
  safety|custom", direction="input|output|both", action="block|warn|
  redact", blocked_topics=/allowed_topics=/max_chars=/require_pattern=/
  forbid_pattern=/detectors=["pii","secrets","injection"]/predicate=)` +
  `app.register_rail_predicate(name, fn)` — deterministic, enforced
  before/after every generation, violations audited as `rail:<name>`;
  self-correction — `app.enable_self_correction(max_cycles=2,
  max_cost_usd=0.05)` (bounded validate→critique→repair, structure-only,
  facts never invented) or `vincio.output.SelfCorrector`; multi-schema
  routing — `app.add_output_schema(schema, keywords=, task_types=, when=)`
  picks the contract per run; `SchemaRouter.classify/validate_any` for
  content-side validation.
- Evals: `Dataset.load("golden.jsonl")`, `app.evaluate(...)`,
  gates like `{"groundedness": ">= 0.95"}`; CLI `vincio eval run`.
- Evals & testing (0.5): metrics add faithfulness | answer_relevance |
  hallucination (strict number checks) | toxicity | bias |
  summarization_quality | knowledge_retention | conversation_relevance
  (conversation via `case.context["messages"]`); `GEvalJudge(provider,
  model=..., criteria=..., samples=N)` + `judge.calibrate(pairs)`;
  `SyntheticGenerator(seed=...).generate(docs, n=...)` (offline templates,
  LLM hook, provenance in `metadata.source_ids` + `rubric.facts`);
  `RedTeamSuite().run(app)` (canary-judged jailbreak/injection/leak/bias/
  toxicity probes, `attack_success_rate`, `detector_coverage`);
  `ExperimentTracker(path).log/compare/ablation` + `ab_test(a, b, metric)`
  (paired/Welch t-test, pure-Python p-values); `vincio.testing` —
  `assert_eval` / `assert_grounded` / `assert_metric` / `assert_safe` +
  pytest plugin with `vincio_snapshot` fixture
  (`--vincio-update-snapshots`).
- Observability (0.5): traces carry `session_id`/`thread_id` (pass
  `session_id=` to `app.run`), `trace.add_score`/`span.add_score` (runtime
  evaluators attach scores automatically), `trace.add_feedback(score=...,
  comment=...)` / `record_feedback(..., exporter=...)`;
  `sessions_from_traces(traces)` → `Session` aggregates;
  `dataset_from_traces(traces, min_feedback_score=...)` → eval dataset;
  viewer: `render_trace_text`, `trace_to_html` / `session_to_html`
  (self-contained static HTML), `trace_diff_html`; OTel exporter emits
  GenAI semantic conventions (`chat {model}`, `gen_ai.*`,
  `gen_ai.conversation.id`).
- Prompt registry (0.5): `PromptRegistry(dir).push(spec, tags=[...])` —
  content-hash versions, moving tags, `diff(name, a, b, rendered=True)`,
  `rollback(name)`, `link_eval(name, version, report)`.
- The closed loop (0.8): `loop = app.improvement_loop(metrics=[...],
  gates={"groundedness": ">= 0.8"})`; `result = loop.run(
  min_feedback_score=0.5)` — capture traces → `dataset_from_traces` →
  baseline eval → gated prompt optimization → promote (registry push +
  tag "production" + `link_eval` + applied to the app + `loop_promotion`
  audit entry + `loop.promoted` event); `dry_run=True` decides without
  acting; candidate evals are memory-write-free; CLI `vincio loop run
  --app app.py [--dataset ds.jsonl | --min-feedback X] [--gate ...]
  [--tag T] [--dry-run]`. Auto-memory from runs: `memory.write_back:
  [facts]` + `memory.fact_min_support` / `memory.max_facts_per_run` —
  evidence-supported output claims (`vincio.memory.extract_grounded_facts`)
  become candidate memories (`origin: run_fact`) through the guarded
  policy. Retrieval feedback: `RetrievalFeedback(app.retrieval,
  records_from_dataset(ds)).tune()` — gated, deterministic tuning of
  per-index RRF weights + heuristic-reranker blend;
  `recommend_chunking(reports_by_config, baseline=...)`. Pareto:
  `pareto_loop(candidates, evaluate_fn, dataset, baseline=,
  objectives=[ObjectiveSpec(...)], constraints={"cost": 0.01},
  prefer="accuracy")` → `ParetoResult` with `frontier.front` /
  `frontier.knee()`; promotion passes the same safety rules. Learned
  budgets: `BudgetLearner(evaluate_allocation).learn(dataset,
  task_type=...)` → `LearnedAllocations` (JSON save/load) installed via
  `app.use_learned_budgets(...)`. Guided search:
  `ContextOptimizer(...).optimize(dataset, strategy="hill_climb|anneal")`
  or `guided_search(space, evaluate, strategy=, budget=, seed=)` —
  deterministic, budget-bounded, gate-respecting.
- Integrations & DX (0.9): framework interop `from vincio.interop import
  add_langchain_tool, from_langchain_loader, from_langchain_retriever,
  from_langchain_embeddings, from_llamaindex_reader, from_llamaindex_retriever,
  from_llamaindex_embedding, add_llamaindex_tool` (duck-typed, no heavy import)
  + `to_langchain_*` / `to_llamaindex_*` (extras `vincio[langchain]` /
  `vincio[llamaindex]`); embedders `build_embedder("local|jina|voyage|cohere|
  <provider>")`; rerankers `build_reranker("heuristic|recency|authority|llm|
  cohere|jina|voyage")` (hosted ones are httpx-only); vector stores
  `from vincio.storage import build_vector_index` over memory | qdrant |
  pgvector | chroma | pinecone | lancedb (extras `vincio[chroma|pinecone|
  lancedb]`); domain packs `app.use_pack("support|engineering|finance|legal")`,
  `from vincio import load_pack, available_packs` (+ `register_pack`); notebook
  reprs `from vincio import enable_rich_reprs` (RunResult/Trace/EvalReport/
  MemoryItem/SearchHit); interactive inspector `vincio.tui.TUI` / `vincio tui`;
  typed config schema `vincio.core.config.config_json_schema()`.
- CLI: init (`--template minimal|rag|agent|eval`, `--provider`), run,
  config schema/validate/show, packs list/show, tui, eval run/report/dataset,
  prompt lint/compile/push/versions/diff/rollback, trace
  show/view/replay/diff/export/sessions/feedback, optimize run, loop run,
  index build, memory inspect/remember/recall/forget/export/consolidate/decay.
- Server: `from vincio.server import create_app` (FastAPI; API key + JWT).
