# Vincio

> Vincio is a Python platform for context-engineered AI applications. It
> compiles prompts, memory, retrieval, tools, schemas, and policies into
> optimized, validated, observable, provider-neutral context packets, then
> validates and evaluates every output.

Package: `pip install vincio` · Python 3.11+ · Apache 2.0 · v2.0.0 (SemVer)
Main entry point: `from vincio import ContextApp`

## Quickstart

```python
from vincio import ContextApp
app = ContextApp(name="docs_qa")
app.add_source("docs", path="./docs", retrieval="hybrid")
app.set_policy("answer_only_from_sources", True)
result = app.run("How do I configure SSO?")
result.output; result.citations; result.trace_id; result.cost_usd
```

## Docs

- [Getting started](docs/getting-started.md)
- [Context packets & compiler](docs/concepts/context-packets.md)
- [Prompt compiler](docs/concepts/prompt-compiler.md)
- [Memory](docs/concepts/memory.md)
- [Retrieval](docs/concepts/retrieval.md)
- [Agents & workflows](docs/concepts/agents.md)
- [Evaluation](docs/concepts/evals.md)
- [Observability](docs/concepts/observability.md)
- [Build a RAG app](docs/guides/build-rag-app.md)
- [Connect data sources](docs/guides/connectors.md)
- [Structured output](docs/guides/structured-output.md)
- [Reliability & guardrails](docs/guides/reliability-guardrails.md)
- [Cost, reliability & scale](docs/guides/cost-and-reliability.md)
- [Add tools](docs/guides/add-tools.md)
- [Orchestrate multi-agent systems](docs/guides/orchestrate-agents.md)
- [Run evals](docs/guides/run-evals.md)
- [Agentic evaluation & continuous quality](docs/guides/agentic-eval.md)
- [Test LLM apps with pytest](docs/guides/test-llm-apps.md)
- [Optimize](docs/guides/optimize-context.md)
- [Close the loop](docs/guides/close-the-loop.md)
- [Performance & streaming](docs/guides/performance.md)
- [Integrations: providers, vector stores, frameworks](docs/guides/integrations.md)
- [MCP (Model Context Protocol) client + server](docs/guides/mcp.md)
- [A2A (agent-to-agent)](docs/guides/a2a.md)
- [Agent Skills (SKILL.md)](docs/guides/agent-skills.md)
- [Reasoning control & the Responses API](docs/guides/reasoning.md)
- [Enterprise governance & compliance](docs/guides/governance.md)
- [Coming from LangChain](docs/guides/migrate-from-langchain.md)
- [Coming from LlamaIndex](docs/guides/migrate-from-llamaindex.md)
- [Coming from Ragas](docs/guides/migrate-from-ragas.md)
- [Coming from Mem0](docs/guides/migrate-from-mem0.md)
- [vs LiteLLM / gateways](docs/comparisons/litellm.md)
- [API reference](docs/reference/api.md)
- [CLI reference](docs/reference/cli.md)
- [Config reference](docs/reference/config.md)
- [API stability & deprecation policy](docs/reference/stability.md)
- [Performance & quality SLOs](docs/reference/slo.md)
- [Threat model](docs/security/threat-model.md)

## Key facts for code generation

- All public data contracts are Pydantic v2 models.
- Async-first: every engine has `arun`/async methods plus sync wrappers.
- 2.0 (breaking window): `ContextApp`'s surface is grouped into lazy capability
  facades (`app.runs` / `.knowledge` / `.governance` / `.optimization` /
  `.serving` / `.training`); the flat `app.<method>` API still works.
  `EvidenceItem`/`ContextCandidate` carry a `modality` (text/image/table) and
  image/table payloads — the compiler scores, budgets, orders, and cites them
  together. Filter retrieval with a structured `FilterSpec`
  (`from vincio.retrieval import eq, in_, range_, and_, or_, not_, FilterSpec`)
  passed as `where=`; it pushes down to native backend filters. Enterprise
  providers `bedrock` / `vertex` / `azure` via `build_provider(...)` behind a
  pluggable `AuthStrategy`. Egress DLP is always-on (`security.egress_dlp`:
  off/warn/block); audit chains can be HMAC/Ed25519-signed
  (`security.audit_signing_key`). Eval: `lexical_overlap` is the lexical metric;
  `semantic_similarity` is embedding-backed; unscoreable cases return
  `MetricResult(skipped=True)` and are excluded from gates.
- Providers: openai, anthropic, google, mistral, local (OpenAI-compatible),
  mock (deterministic, offline; generates schema-valid structured output);
  plus OpenAI-compatible presets groq | together | fireworks | openrouter |
  deepseek | perplexity | xai | nvidia via `openai_compatible(name)` /
  `build_provider(name)` (`<NAME>_API_KEY` env) or any endpoint with
  `openai_compatible(base_url=..., api_key=...)`.
- `ContextApp.run()` executes: normalize → classify → policy → memory →
  retrieve → compile context (score/dedupe/conflict/compress/budget) →
  compile prompt (cache-aware) → model (+bounded tool loop) → validate
  (schema/citations/policy, principled repair) → evaluate → trace → memory write.
- Streaming: `async for event in app.astream("...")` — `stage`, `text_delta`,
  `partial_output` (incremental partial-JSON), `tool_*`, then `done` with the
  full `RunResult`; the server SSE endpoint emits the same events.
- Performance: bounded concurrent fan-out (`vincio.core.concurrency`),
  content-addressed compile/chunk/embedding caches (on by default),
  request coalescing + pooled provider transport, slim (zero-copy) packets,
  hard `max_latency_ms` deadline + cancellation propagation; VincioBench
  budgets gate CI (`benchmarks/check_budgets.py`).
- Retrieval: index modes bm25 | dense | sparse (SPLADE-style) |
  late_interaction (ColBERT-style MaxSim, PLAID-style compression) |
  hybrid | hybrid_full | graph | hybrid_graph, all fused by weighted RRF;
  query strategies hyde | multi_query | decompose | step_back
  (engine `query_strategies=` or config `retrieval.query_strategies`);
  chunking adds sentence_window | hierarchical/parent_document (use
  `AutoMergingIndex`) | contextual (+`contextualize_chunks` for LLM
  prefixes); `GraphRAG` (communities + summaries, global/local routing);
  `LiveIndex` (upsert/TTL/freshness) + `VectorIndex.migrate`.
- Connectors: `from vincio.connectors import connect` — web, github, sql,
  s3, gcs, notion, confluence, slack, custom via `register_connector`;
  `app.add_source("kb", connector=connect("web", urls=[...]))`.
- Memory: `app.remember(content, user_id=...)` / `app.recall(query,
  user_id=...)`; scopes session | user | agent | tenant | organization |
  global; scoped handles `app.memory.for_user("u1").remember(...)`; hybrid
  lexical+vector+graph recall (`memory.hybrid_recall`, on by default);
  consolidation `await app.memory.consolidate(session_id, user_id=...)`
  (episodic→semantic, provenance in `consolidated_from`); hygiene: per-scope
  `memory.ttl_days`, importance-weighted `decay_pass()`, audited
  `edit`/`forget`/`export_owner_data`/`erase_owner_data`; run write-back via
  `memory.write_back: [input, evidence, tools]`; eval harness
  `vincio.memory.evaluate_memory` (recall precision, contradiction rate,
  staleness, personalization lift) gated in VincioBench.
- Agents & orchestration: `app.agent(tools=..., planner="dag|dynamic|
  react|direct", max_steps=...)` → bounded `AgentState` with `metrics()`;
  crews `app.crew(members=[{"name", "description", "goal", "keywords",
  "budget_fraction", ...}], process="sequential|parallel|hierarchical",
  max_rounds=...)` → `CrewResult` (output, reports, delegations, blackboard
  snapshot) — members share a versioned `Blackboard`, run under scaled
  budget shares, and always terminate; durable graphs `app.graph(name)` →
  `StateGraph.add_node(name, fn)` (fn: dict state → dict updates) /
  `add_edge` / `add_conditional_edge(source, router, targets=)` /
  `compile(interrupt_before=, interrupt_after=, max_steps=)`; checkpoints
  persist per step in the app metadata store; `invoke`/`ainvoke` returns
  `GraphResult(status="done|interrupted|max_steps", state, thread_id)`;
  `resume(thread_id, value=...)` (a node-level `interrupt(state, payload)`
  re-runs and receives the value), `update_state(thread_id, values)` for
  edit-and-resume, `history(thread_id)` + `fork(checkpoint_id)` for
  time-travel replay; composition `compose(a) | b | c` (functions, agents,
  crews, workflows, graphs; results normalized), `parallel(name=step, ...)`,
  `branch(router, routes)`, streaming via `pipeline.astream(value)` →
  NodeEvents; workflow approval gates with no `approval_fn` pause
  (`status == "paused"`, `pending_approvals`) and
  `workflow.resume(result, approvals={"step": True})` continues without
  re-running done steps; backends `LangGraphBackend().compile(graph)` and
  `OpenAIAgentsBackend().export_crew(crew)` export to those runtimes
  (lazy imports, no lock-in).
- Output schemas: pass a Pydantic class as `output_schema=`; `result.output`
  is a validated instance.
- Structured output & reliability: provider-native constrained
  decoding negotiated per run (`vincio.output.to_strict_json_schema`
  strict-sanitizes the schema for the decoder; validation runs on the
  original; the mode lands on the trace as `decoding=native|prompt`);
  grammar-style `choice_schema(options)` / `regex_schema(pattern)`;
  streaming validation — `partial_output` events carry `valid_prefix` /
  `validation_errors` (abort early when False); typed signatures —
  `class Sig(Signature)` with `InputField`/`OutputField` (docstring =
  instruction) or `signature("question, context -> answer, confidence:
  float")`, executed via `Predict(sig, provider=, model=)` or
  `app.predictor(sig)(**inputs)` (typed result), optimizer target via
  `sig.to_prompt_spec()`; rails — `app.add_rail(name=, kind="topic|format|
  safety|custom", direction="input|output|both", action="block|warn|
  redact", blocked_topics=/allowed_topics=/max_chars=/require_pattern=/
  forbid_pattern=/detectors=["pii","secrets","injection"]/predicate=)` +
  `app.register_rail_predicate(name, fn)` — deterministic, enforced
  before/after every generation, violations audited as `rail:<name>`;
  self-correction — `app.enable_self_correction(max_cycles=2,
  max_cost_usd=0.05)` (bounded validate→critique→repair, structure-only,
  facts never invented) or `vincio.output.SelfCorrector`; multi-schema
  routing — `app.add_output_schema(schema, keywords=, task_types=, when=)`
  picks the contract per run; `SchemaRouter.classify/validate_any` for
  content-side validation.
- Evals: `Dataset.load("golden.jsonl")`, `app.evaluate(...)`,
  gates like `{"groundedness": ">= 0.95"}`; CLI `vincio eval run`.
- Evals & testing: metrics add faithfulness | answer_relevance |
  hallucination (strict number checks) | toxicity | bias |
  summarization_quality | knowledge_retention | conversation_relevance
  (conversation via `case.context["messages"]`); `GEvalJudge(provider,
  model=..., criteria=..., samples=N)` + `judge.calibrate(pairs)`;
  `SyntheticGenerator(seed=...).generate(docs, n=...)` (offline templates,
  LLM hook, provenance in `metadata.source_ids` + `rubric.facts`);
  `RedTeamSuite().run(app)` (canary-judged jailbreak/injection/leak/bias/
  toxicity probes, `attack_success_rate`, `detector_coverage`);
  `ExperimentTracker(path).log/compare/ablation` + `ab_test(a, b, metric)`
  (paired/Welch t-test, pure-Python p-values); `vincio.testing` —
  `assert_eval` / `assert_grounded` / `assert_metric` / `assert_safe` +
  pytest plugin with `vincio_snapshot` fixture
  (`--vincio-update-snapshots`).
- Agentic evaluation & continuous quality (1.2): trajectory & tool-use metrics
  `tool_call_accuracy | tool_call_f1 | goal_accuracy | plan_adherence |
  plan_quality | step_efficiency | topic_adherence` score the agent
  `trajectory` carried on a RunOutput — build it with
  `RunOutput.from_agent_state(state)` / `from_crew_result(result)` /
  `from_trace(trace)` (no re-instrumentation); expected/optimal refs live in
  `rubric['expected_tools' | 'plan' | 'optimal_steps' | 'topic']`;
  `report.metric_families()` shows output-only vs trajectory side by side. Plus
  conversational `conversation_outcome` / `intent_resolution`. Multi-turn:
  `Simulator(seed=...).simulate(agent, Persona(goal=...))` (LLM-backed,
  deterministic template fallback) → `SimulatedConversation.to_eval_case()`;
  `dataset_from_traces(traces, group_by_session=True)` makes multi-turn goldens.
  Online/continuous: `app.add_online_evaluator(metric, sample_rate=...)` scores
  a sample of live runs off the hot path, writing a score time series to the
  store (`OnlineEvaluator.series()`); `await app.aflush_online()` drains in
  tests. Drift: `DriftMonitor(bus=, store=, score_threshold=, embedding_threshold=)`
  `.set_score_baseline/check_scores` + `.set_embedding_baseline/check_embeddings`
  raise a `drift.detected` event and persist `drift_baselines`; CLI
  `vincio eval drift baseline.json current.json`. Annotation: `AnnotationQueue`
  + `cohens_kappa(pairs, bins=)`; `GEvalJudge.calibrate(pairs)` now also returns
  `cohens_kappa` and `judge.gating_weight(threshold=)` / `queue.judge_trusted()`
  gate a judge on agreement; CLI `vincio eval annotate labels.jsonl`. A/B:
  `app.experiment(name, variants={...}, dataset=, metrics=[...])` → `Experiment`
  with `.compare()`, `.cost()`, `.significance(metric)`. Interconnection: every
  metric is reusable as a runtime guardrail — `app.add_metric_rail(metric,
  threshold=)` / `metric_guardrail(metric, threshold=)` — and as optimizer
  fitness via `AGENTIC_OBJECTIVES` (trajectory metrics are ordinary metrics).
- Observability: traces carry `session_id`/`thread_id` (pass
  `session_id=` to `app.run`), `trace.add_score`/`span.add_score` (runtime
  evaluators attach scores automatically), `trace.add_feedback(score=...,
  comment=...)` / `record_feedback(..., exporter=...)`;
  `sessions_from_traces(traces)` → `Session` aggregates;
  `dataset_from_traces(traces, min_feedback_score=...)` → eval dataset;
  viewer: `render_trace_text`, `trace_to_html` / `session_to_html`
  (self-contained static HTML), `trace_diff_html`; OTel exporter emits
  GenAI semantic conventions (`chat {model}`, `gen_ai.*`,
  `gen_ai.conversation.id`).
- Prompt registry: `PromptRegistry(dir).push(spec, tags=[...])` —
  content-hash versions, moving tags, `diff(name, a, b, rendered=True)`,
  `rollback(name)`, `link_eval(name, version, report)`.
- The closed loop: `loop = app.improvement_loop(metrics=[...],
  gates={"groundedness": ">= 0.8"})`; `result = loop.run(
  min_feedback_score=0.5)` — capture traces → `dataset_from_traces` →
  baseline eval → gated prompt optimization → promote (registry push +
  tag "production" + `link_eval` + applied to the app + `loop_promotion`
  audit entry + `loop.promoted` event); `dry_run=True` decides without
  acting; candidate evals are memory-write-free; CLI `vincio loop run
  --app app.py [--dataset ds.jsonl | --min-feedback X] [--gate ...]
  [--tag T] [--dry-run]`. Auto-memory from runs: `memory.write_back:
  [facts]` + `memory.fact_min_support` / `memory.max_facts_per_run` —
  evidence-supported output claims (`vincio.memory.extract_grounded_facts`)
  become candidate memories (`origin: run_fact`) through the guarded
  policy. Retrieval feedback: `RetrievalFeedback(app.retrieval,
  records_from_dataset(ds)).tune()` — gated, deterministic tuning of
  per-index RRF weights + heuristic-reranker blend;
  `recommend_chunking(reports_by_config, baseline=...)`. Pareto:
  `pareto_loop(candidates, evaluate_fn, dataset, baseline=,
  objectives=[ObjectiveSpec(...)], constraints={"cost": 0.01},
  prefer="accuracy")` → `ParetoResult` with `frontier.front` /
  `frontier.knee()`; promotion passes the same safety rules. Learned
  budgets: `BudgetLearner(evaluate_allocation).learn(dataset,
  task_type=...)` → `LearnedAllocations` (JSON save/load) installed via
  `app.use_learned_budgets(...)`. Guided search:
  `ContextOptimizer(...).optimize(dataset, strategy="hill_climb|anneal")`
  or `guided_search(space, evaluate, strategy=, budget=, seed=)` —
  deterministic, budget-bounded, gate-respecting.
- Reflective optimization & the data flywheel (1.4, all `@experimental`):
  reflective optimizer `app.reflective_optimize(dataset, strategy="reflective|
  mipro", budget=, minibatch_size=, gates=, apply=)` → `ReflectiveResult`
  (drop-in `OptimizationResult` with `.frontier` / `.reflections`); reads eval
  failures, proposes targeted edits (`HeuristicReflector` deterministic /
  `LLMReflector` with fallback), evolves a `ParetoFrontier` under a hard rollout
  budget, deterministic under seed; also `ImprovementLoop(optimizer="reflective")`
  / `vincio optimize reflective` / `vincio loop run --reflective`. Distillation
  flywheel: faithful flag-free export from RunResults — `results=[app.run(q) for q
  in prompts]; ts = app.export_training_set(runs=results, require_grounding=True,
  path=, format="openai|anthropic")` (RunResults carry full output + cited evidence,
  runtime stamps `metadata['input']`; `export_training_set_from_runs(runs,...)`
  standalone) → `TrainingSet` (grounded, deduped, provenance). Trace path
  (feedback-filtered) needs `app.enable_training_capture()` (covers streaming):
  `app.export_training_set(min_feedback_score=)` / `export_training_set(traces,...)`;
  `vincio distill --traces-dir --output --format`. Gate a cheaper
  student `app.distill(ts, held_out, teacher=, student=, min_quality_ratio=)` →
  `DistillationResult` (promotes a `ModelCascade` only on quality-hold + cost cut;
  `BootstrapFinetune` with an injected `trainer`). Learned compression:
  `from vincio.context import LLMLinguaCompressor` (drop-in
  `ContextCompiler.compressor`, protects numbers/entities/citations/query terms);
  faithfulness-gated adoption via `app.gate_compression(dataset)` →
  `CompressionTuningResult`, or `app.use_learned_compression()` (ungated);
  `compression_faithfulness` / `faithfulness_preserved`. Optimizer-judge
  calibration: `app.calibrate_judge(geval, samples)` (samples are
  `(case, output, human_score)`) reflectively tunes the judge's steps to maximize
  Cohen's κ, adopting only on a strict gain → `JudgeCalibrationResult`.
- Integrations & DX: framework interop `from vincio.interop import
  add_langchain_tool, from_langchain_loader, from_langchain_retriever,
  from_langchain_embeddings, from_llamaindex_reader, from_llamaindex_retriever,
  from_llamaindex_embedding, add_llamaindex_tool` (duck-typed, no heavy import)
  + `to_langchain_*` / `to_llamaindex_*` (extras `vincio[langchain]` /
  `vincio[llamaindex]`); embedders `build_embedder("local|jina|voyage|cohere|
  voyage-context|voyage-multimodal|cohere-multimodal|<provider>", dimensions=N)`
  (1.5: Matryoshka/MRL truncation, contextual & unified text+image multimodal
  embedders, query/document `input_type` hints via `embed_texts`); rerankers
  `build_reranker("heuristic|recency|authority|llm|cohere|jina|voyage")` (hosted
  ones are httpx-only); vector stores `from vincio.storage import
  build_vector_index` over memory | qdrant | pgvector | chroma | pinecone |
  lancedb | weaviate | milvus | elasticsearch | opensearch | vespa (extras
  `vincio[chroma|pinecone|lancedb|weaviate|milvus|elasticsearch|opensearch|
  vespa]`); layout-aware PDF extraction `load_document(path, layout=True)` /
  `extract_pdf_layout` (extra `vincio[pdf-layout]`); voice/realtime (1.5,
  optional) `from vincio.realtime import RealtimeSession, connect_realtime`
  (backends inprocess|openai|gemini; VAD, interruption, in-session tools via the
  permissioned runtime; `app.realtime_session(...)`; extra `vincio[realtime]`);
  domain packs `app.use_pack("support|engineering|finance|legal")`,
  `from vincio import load_pack, available_packs` (+ `register_pack`); notebook
  reprs `from vincio import enable_rich_reprs` (RunResult/Trace/EvalReport/
  MemoryItem/SearchHit); interactive inspector `vincio.tui.TUI` / `vincio tui`;
  typed config schema `vincio.core.config.config_json_schema()`.
- Stability & guarantees (1.0): SemVer on the frozen public surface
  (`vincio.__all__` / `vincio.stability.public_api()`); `from vincio import
  deprecated, experimental, stability_of` — `@deprecated(since=, removed_in=,
  alternative=)` / `@experimental(since=)` emit `VincioDeprecationWarning` /
  `VincioExperimentalWarning` (escalate to errors via `warnings.simplefilter`);
  nothing public is removed in a minor/patch, only deprecated then removed at
  the next major. Published SLOs (`benchmarks/slos.json`,
  `docs/reference/slo.md`) each enforced by a VincioBench budget held at least
  as strict (`tests/test_slos.py`). Security hardening: offline audit-chain
  verification (`from vincio.security import verify_audit_file`;
  `AuditLog.verify_file()`; `vincio audit verify <path>`); tool sandbox
  `setrlimit` CPU/memory/fd limits (`run_subprocess_sandboxed` /
  `SandboxedPython(max_cpu_seconds=, max_memory_bytes=, max_open_files=)`);
  releases ship a CycloneDX SBOM + SLSA provenance attestations. Threat model
  in `docs/security/threat-model.md`.
- Governance & compliance (1.6, experimental; `vincio.governance`): generated
  from the running system, no hosted dependency. Model/system cards
  `app.model_card()` / `app.system_card()` (`CardFormat` vincio|open_model_card|
  ai_card) / `vincio governance card`. Compliance mapping `app.compliance_report()`
  / `ComplianceMapper().map(redteam=, eval_report=, target=app)` over OWASP LLM
  Top 10 (2025) / OWASP Agentic / NIST AI RMF / MITRE ATLAS, evidence from
  `RedTeamSuite` + `EvalReport` + config (`ComplianceReport.coverage_rate` /
  `to_markdown()`); `vincio governance report --red-team --markdown`. AI-BOM
  `app.aibom()` (CycloneDX 1.6, `AIComponent.verify` / `sha256_file`); `vincio
  governance aibom`. EU AI Act: `from vincio.governance import
  mark_synthetic_content, ai_disclosure, data_summary` (C2PA-style manifest,
  bound by SHA-256); `governance.content_marking: true` marks every run.
  Lineage + erasure: `app.trace_lineage(source)`, `app.erase_source(source)`
  (purges every index/memory/cache, audited, idempotent); `vincio governance
  lineage|erase`. Residency: `app.set_residency(["eu"], provider_regions={...})`
  / `governance.allowed_regions` refuse egress as a blocking PolicyViolation
  (`ResidencyViolationError`); region is inferred from a region-pinned endpoint
  (`provider.base_urls`) via `infer_region_from_url` (AWS/GCP/Vertex/sovereign)
  with jurisdiction-aware matching (eu admits eu-west-1/europe-west4). Content
  signing: `mark_synthetic_content(..., signer=HmacSigner(secret))` +
  `verify_manifest(manifest, content, signer=)` (built-in symmetric HMAC, or your
  own `ContentSigner`); `app.content_signer` signs every marked run. Multilingual
  PII: `PIIDetector(locales=["fr",
  "de","es","in","sg","br","uk"])` / `governance.locales`; per-language eval
  slicing `EvalReport.slice_by_tag("lang:")` / `tag_gap`; token tax
  `app.fertility.token_tax(lang)`. RAG-poisoning: `from vincio.security import
  PoisoningDetector` (authority/provenance signals + classifier hook; FP/FN via
  `PoisoningReport.telemetry`).
- The honest, fast spine (1.7, experimental): the advertised `Budget` is a hard
  cap on `app.run()`/`arun()` — `max_cost_usd`/`max_input_tokens`/
  `max_output_tokens`/`max_steps` raise `BudgetExceededError` (on the same audit
  chain as residency/policy), with a pre-flight input-token check and
  `RunConfig(enforce_budget_caps=False)` restoring the legacy soft cap for one
  minor. Data-driven `ModelRegistry` (`from vincio import default_model_registry`
  / `ModelRegistry`): a versioned, config-overridable (`VINCIO_MODEL_REGISTRY`
  JSON/YAML overlay) catalog keyed by exact model id binding `ModelProfile`
  (capabilities + standard/batch pricing + context window + modalities + GA/
  deprecation/retirement dates); `capabilities()` and `PriceTable` derive from it
  (substring sniffing demoted to fallback) and an unknown model warns +
  emits `model.unknown` instead of billing $0. Provider-native token counters
  register behind the `TokenCounter` Protocol (`register_token_counter`);
  third-party adapters auto-register via the `vincio.providers`/`embedders`/
  `stores` entry-point groups. Opt-in semantic context scoring
  (`app.use_semantic_context_scoring()` / `retrieval.semantic_context_scoring`):
  embedding-cosine relevance, MMR selection (`mmr_lambda`), reranker
  `upstream_relevance` blended into relevance, and salient-unit value-level
  contradiction (numbers/dates) emitted as structured packet conflicts; default
  stays lexical. Unified run pipeline: streaming and non-streaming share one
  latency-deadline + cancellation epilogue; `app.submit(...)` returns a
  `RunHandle` whose `cancel()` propagates cooperative cancellation and still
  records the cancelled run; async store contract (`storage.base.asave`/`aquery`)
  keeps persistence off the event loop. Significance-gated promotion: `ab_test`
  now returns p-value + confidence interval + effect size, and `evolution_loop`
  blocks significant regressions / warns under-powered runs (loop audit records
  the verdict). Trace-replay executor `ReplayRunner(app).replay(traces,
  pin_tools=)` re-runs captured trace inputs and diffs output/trajectory/cost
  (reusing `trace_diff` + `EvalReport.diff`); `vincio trace replay --against
  <app>`. Sub-quadratic hot paths: inverted-index BM25, incremental MMR
  `_select`, memoized `count_tokens`, optional numpy vector path. Hardened
  detectors: injection normalization + recursive base64/hex/rot13 decode pre-pass
  (catches obfuscated attacks), a pluggable `DetectorBackend` Protocol on the
  PII/injection/secret detectors, `AccessController(require_explicit_tenant=True)`
  fails closed on untagged tenants, and `ComplianceMapper` marks a control
  `covered` only with measured red-team/eval evidence (config flags ⇒ `partial`).
- Provider/model rotation & swap regression (1.8, experimental): capability guard
  — `from vincio.providers import requirements_for, capability_check` intersect a
  request's needs (vision/tools/structured-output/reasoning/context) with a
  model's `ModelCapabilities`; unknown models are never blocked. `FailoverChain` /
  `HealthAwareFailover` guard by default (`guard_capabilities=False` opts out):
  skip a capability-mismatched substitution, classify a terminal lifecycle error
  (`is_lifecycle_error`), raise `ModelRetiredError` ("rotate now") when all
  candidates are retired; `CapabilityMismatchError` when none can serve. Router —
  `app.use_router(models, *, strategy="cheapest|fastest|least_busy",
  budget_usd=)` or `Router.from_models(provider, models)` picks the cheapest/
  fastest/least-busy *capable* model, downgrades to a per-request budget, emits a
  `model.routed` `RoutingDecision`. Swap gate — `app.gate_swap(candidate_model, *,
  baseline_model=, dataset=, traces=, gates=, repeats=, flake_quarantine=)` →
  `SwapVerdict` (`.passed`, `.reason`, `.regression`, `.replay`): replays golden
  traces (`ReplayRunner`) + `evaluate_gates` + `DriftMonitor` + `ab_test` +
  behavioral shape diff (tool-call/refusal rate, output length). Model-swap
  regression — `app.swap_regression(dataset, *, candidate_model=, baseline_model=,
  repeats=)` / `model_swap_regression(...)` → `SwapRegressionReport`
  (`.regressions`, `.cost`, `.worst_slices`, `.metric_tests`). `EvalRunner(...,
  repeats=N, repeat_aggregate="mean", flake_quarantine=True, flake_threshold=0.15)`
  records per-case mean/stdev and excludes flaky cases from gates. Shadow & canary
  — `app.shadow(candidate_model, *, candidate_provider=, block=)` →
  `ShadowProvider` (returns primary, dual-dispatches candidate, `.observations`/
  `.diff()`); `app.canary(candidate_model, *, percent=, score_fn=, min_samples=,
  regression_threshold=, prompt_name=)` → `CanaryRouter` (ramps %, online scoring,
  auto-rollback to primary + prompt-registry head, emits `canary.rollback`); both
  are `ModelProvider`s. Lifecycle — `app.watch_lifecycle(models=, as_of=)` /
  `LifecycleWatcher(...).scan()/propose_migration()` emit sunset alerts +
  `MigrationProposal` (successor or cheaper Pareto-dominating; `apply_to_cascade`/
  `apply_to_policy`/`apply_to_config`). Discovery — `ModelProvider.list_models()`
  (OpenAI/Anthropic/Google) + `ModelRegistry.reconcile(profiles)` /
  `discover_models(provider)` (offline-safe). Google/Vertex `GoogleBatchBackend`
  completes half-cost batch parity. New `vincio.__all__`: `Router`, `SwapGate`,
  `SwapVerdict`, `model_swap_regression`, `ShadowProvider`, `CanaryRouter`,
  `LifecycleWatcher`. Edge: routes by capability *and* cost inside your audit
  boundary, refuses mismatched substitutions, gates every swap on replayed golden
  traces with significance, and auto-rolls-back a canary — in-process, not a proxy.
- Documents & images flow OUT (1.9, experimental, `vincio.generation`): documents
  and media come out under the same guarantees as text in. DocumentBuilder —
  `app.build_document(source, *, format="markdown|html|docx|pdf|pptx",
  contract=DocumentContract(required_sections=, table_specs=[TableSpec(...)],
  min_words=, citations_per_section=))` turns a *validated* result (RunResult /
  mapping / Markdown) into a rendered, structurally-validated, provenance-audited
  `DocumentArtifact` (`document_generate` event); Markdown/HTML are dependency-free,
  DOCX/PDF/PPTX need `vincio[gen-docx|gen-pdf|gen-pptx]`. Repair is formatting-only.
  Plus `fill_text_template`/`fill_docx_form`/`fill_pdf_form` (typed citation-aware
  `Slot`s) and `generate_redline`. Cited reports — `app.cited_report(answer,
  evidence, *, format=, contract=CitationContract(min_coverage=, require_entailment=,
  min_entailment_rate=))` (async `acited_report`) resolves `[E1]` markers to numbered
  footnotes + a bibliography, computes sentence-level coverage and per-claim
  entailment (pluggable backend); unresolved markers surface. Image/TTS —
  `app.generate_image(prompt, *, provider=ImageProvider, model=)` (async
  `agenerate_image`) and `app.synthesize_speech(text, *, provider=SpeechProvider,
  voice=, format=)` (async); providers `MockImageProvider`/`OpenAIImageProvider`/
  `GoogleImageProvider`/`HTTPImageProvider` and `MockSpeechProvider`/
  `OpenAISpeechProvider`/`GoogleSpeechProvider`/`ElevenLabsSpeechProvider`; every
  asset C2PA-stamped (bound by SHA-256), budget-metered (`meter_media_cost`), audited
  (`image_generate`/`speech_synthesize`). Media-aware `mark_synthetic_content(bytes,
  media_type=, edited=)` + `embed_provenance` (PNG) + `write_sidecar_manifest`.
  Audio chat input — `ContentPart.audio` now renders to OpenAI `input_audio` /
  Gemini `inlineData` via `core.media.encode_audio_bytes`. Richer inputs —
  `load_pdf(path, ocr_engine=)` OCR fallback (`vincio[ocr]`, `extractor='ocr'`),
  `load_media(path, transcriber=MockTranscriber()|WhisperTranscriber())` →
  timestamped transcript Document, `figure_evidence(doc, crops=, analyzer=|ocr_engine=)`,
  `parse_html`/`structure_data`; new loaders PPTX/EPUB/RTF/ODT (dep-free), Parquet
  (`vincio[parquet]`), mbox, `.msg` (`vincio[msg]`) via a `ParserRegistry`
  (`register_loader`). Forms/KYC — `HeuristicFormExtractor().extract(text)` →
  `FormField`s (+bbox/confidence), `form_fields_to_evidence`; `DocumentAI` adapters
  (Textract/Azure/Google). EU AI Act conformity pack — `app.risk_tier(purpose=,
  domains=)` → `RiskAssessment` (advisory tier), `app.annex_iv(...)` (Annex IV tech
  docs) and `app.fria(..., affected_groups=)` (Art. 27 FRIA) rendered through the
  document engine, grounded by the live config/cards/compliance matrix, recorded as
  `conformity_doc`; ISO/IEC 42001 controls join `ComplianceMapper`. New
  `vincio.__all__`: `DocumentBuilder`, `DocumentContract`, `DocumentArtifact`,
  `CitedReportBuilder`, `CitationContract`, `generate_redline`, `ImageProvider`,
  `ImageGenRequest`, `MockImageProvider`, `SpeechProvider`, `SpeechRequest`,
  `MockSpeechProvider`, `RiskTierClassifier`, `AnnexIVBuilder`, `FRIAGenerator`.
  Edge: a generated image is C2PA-stamped and a cited report is per-claim *entailed*,
  not just `[E1]`-marked — documents/media out under text's exact guarantees, on one
  audit chain, in-process.
- The loop closes itself + the agentic frontier (1.10, experimental): continual,
  online, safe self-improvement plus deep research, self-editing memory, and
  computer-use on the same cited/audited spine. Online controller —
  `app.continuous_improvement(metrics=, golden=, sustain=, cooldown_s=,
  eval_budget=, quality_floor=, reoptimize=)` → `ContinuousImprovementController`;
  `.set_baseline(metric, values).attach()` subscribes to `drift.detected` +
  `eval.online`, debounces, and turns sustained drift into one gated action
  (re-eval / `ImprovementLoop` re-optimize / rollback to last known-good prompt),
  audited + restart-safe (persisted `controller_state`). Drift —
  `evals/drift.py` adds `ks_statistic`/`ks_drift`, `psi`, `rbf_mmd2`, a streaming
  `CUSUMDetector`, and `DriftMonitor.observe_score`/`check_distribution`.
  `OnlineEvaluator` persists its sampling counter (restart-safe,
  `observed_total()` aggregates workers). Real reflector (GEPA proper) —
  `LLMReflector(provider, model)` reads actual failing cases, `cluster_failures`
  groups them into modes, proposes schema-validated edits;
  `app.reflective_optimize(..., reflector="llm")` / `ImprovementLoop(reflector=)`,
  heuristic fallback offline. Experiment proposer — `app.experiment_proposer(...)`
  → `ExperimentProposer.rank/propose/run_next` ranks the weakest metric and
  schedules a prompt/retrieval/budget/routing/distillation experiment. Guarded
  bandits — `GuardedBanditRouter(entries, bandit="epsilon_greedy|ucb1|linucb")`
  (a `ModelProvider`) with a safety floor (no exploration on high-risk/safety
  traffic), per-arm regret, auto-freeze/rollback, persisted `bandit_state`;
  `LinUCB` is contextual; `app.use_bandit_router(models)`. Non-regression guard —
  `GoldenRegressionSuite(path)` records each promotion's fixed cases with
  provenance and `gate(report)` blocks a regressing candidate;
  `ImprovementLoop(golden_suite=)`. Deep research — `app.research(question, *,
  budget=ResearchBudget(breadth=, depth=, max_sources=, top_k=))` → `ResearchAgent`
  loops search→read→reflect→verify→synthesize over the query planners +
  `extract_grounded_facts`, dedups sources, emits a cited report via
  `CitedReportBuilder`; metrics: citation_coverage/grounding/source_diversity.
  Memory OS — `app.enable_memory_os(scope=, owner_id=, max_core_tokens=)` →
  `MemoryOS` registers `memory_append`/`memory_replace`/`memory_search`/
  `memory_archive` as permissioned, audited tools over the guarded write pipeline,
  with a context-pressure pager (core↔archival). Agent loop — level-parallel DAG
  over `topological_levels`, a real `plan_and_execute` replanning loop
  (`Planner.replan`), and in-loop compaction (`ContextCompactor`) replacing fixed
  `[-8]`/`[:24]` slicing. Computer-use — `app.enable_computer_use(backend=
  "mock|playwright|provider", require_isolation=)` registers navigate/click/type/
  screenshot as approval-gated tools; pluggable `IsolationBackend` in
  `tools/sandbox.py` (`Subprocess`/`Container`/`GVisor`/`MicroVM`/`WASM`,
  `require_real_isolation`); hosted tools — `app.use_hosted_tools(["web_search",
  "file_search","code_interpreter","computer_use"])` surfaces OpenAI Responses
  built-ins as namespaced, permissioned tools. New `vincio.__all__`:
  `ContinuousImprovementController`, `ControllerDecision`, `ExperimentProposer`,
  `GoldenRegressionSuite`, `GuardedBanditRouter`, `ResearchAgent`,
  `ResearchBudget`, `ResearchReport`. New extra `vincio[computer-use]` (Playwright);
  new error `SandboxError`. Edge: continual self-improvement that is gated and
  reversible, deep research where every claim is cited and budget-bounded,
  self-editing memory that is provenance-tracked and audited, and computer-use that
  is isolated and audited — all on one packet/ledger/audit/trace, in-process, under
  a held-out non-regression guard the field's offline bandits and thin GUI adapters
  lack.
- CLI: init (`--template minimal|rag|agent|eval`, `--provider`), run,
  config schema/validate/show, packs list/show, tui,
  eval run/report/dataset (`--group-by-session`)/drift/annotate/regress
  (`--baseline-model X --candidate-model Y`),
  prompt lint/compile/push/versions/diff/rollback, trace
  show/view/replay/diff/export/sessions/feedback, optimize run, optimize
  reflective, loop run (`--reflective`), distill, index build, memory
  inspect/remember/recall/forget/export/consolidate/decay,
  audit verify, governance card/report/aibom/lineage/erase, mcp tools/add/serve,
  providers list/lifecycle/discover/regress.
- Server: `from vincio.server import create_app` (FastAPI; API key + JWT).
- Protocols & interoperability (1.1, experimental): MCP client + server
  (`vincio.mcp`: `app.add_mcp_server(name, command=/url=/server=)`,
  `app.serve_mcp()`; stdio / Streamable HTTP / in-process; MCP tools register
  through the permissioned/sandboxed/audited runtime, resources become evidence
  with `origin: mcp:<server>`, sampling routes to the provider, elicitation to a
  human gate; OAuth 2.1 seams). A2A (`vincio.a2a`: `app.serve_a2a(crew|graph)`,
  Agent Card + JSON-RPC task lifecycle, `RemoteA2AAgent` as a bounded crew
  delegate). Agent Skills (`vincio.skills`: `app.add_skill(path)`, `SKILL.md`
  with progressive disclosure, bundled scripts as sandboxed tools). Unified
  reasoning control: `RunConfig(reasoning_effort="low|medium|high")` /
  `thinking_budget_tokens` across OpenAI/Anthropic/Gemini, thinking tokens
  billed; optional `OpenAIResponsesProvider` (`build_provider("openai_responses")`).
- Cost, reliability & scale (1.3, experimental): batch — `app.batch(inputs, *,
  discount=0.5)` / `app.abatch(...)` → `list[RunResult]`, or
  `BatchRunner(backend_or_provider, *, discount=0.5).run(requests)` →
  `BatchRunResult` (`.cost_usd`, `.succeeded`, `.failed`, `.by_id()`,
  reconcile via `BatchRequest(custom_id, request)` → `BatchResult.custom_id`);
  ~50% cost (`discount=0.5`); backends `InProcessBatchBackend` (offline/default)
  | `OpenAIBatchBackend(completion_window="24h")` | `AnthropicBatchBackend`;
  CLI `vincio batch app.py --input X [--input-file lines.txt] [--discount 0.5]
  [--output results.json]`. Resilience (compose inner→outer): retry →
  fallback → circuit-break — `CircuitBreaker(RetryingProvider(provider), *,
  failure_threshold=0.5, cooldown_s=30.0)` (a `ModelProvider`; `.state`/`.healthy`,
  `CircuitState` CLOSED/OPEN/HALF_OPEN, emits `circuit.opened|closed|half_open`),
  `HealthAwareFailover([(provider, label), ...])` (open breakers raise
  non-retryable `CircuitOpenError`, skipped in microseconds), `KeyPool(providers,
  *, rpm=, tpm=, breaker=True)` (round-robin health-aware across keys/regions,
  dual RPM+TPM `RateLimiter`, full-jitter backoff honoring `retry_after`).
  Cascade — `app.use_cascade(models, *, min_confidence=0.5, max_escalations=None,
  confidence=)` (cheap→strong, escalate on low confidence) or
  `ModelCascade.from_models([...])` / `ModelCascade([CascadeRung(model, provider=,
  min_confidence=)])`; `response_confidence(resp)` = 1.0 clean stop / 0.0
  length|content_filter|error / 0.2 schema-expected-but-unparsed. Budgets —
  `app.set_cost_budget(limit_usd=, scope="tenant|feature|user|global", id=,
  period="run|hour|day|month|total", on_breach="cap|degrade|queue_to_batch",
  degrade_model=, anomaly_factor=)`; `app.cost_report(by="tenant|feature|user|
  model|provider|run", since=)` → `CostReport` (`.total_usd`, `.rows`,
  `.print_summary()`); `app.cost_ledger`/`app.budget_manager` always present;
  `arun`/`astream` gained `feature=` (cost-attribution dim alongside
  `tenant_id`/`user_id`/`session_id`); events `cost.anomaly` /
  `cost.budget_exceeded`, hash-chained audit `cost_budget` (deny|degrade); CLI
  `vincio cost report --by ... [--db .vincio/vincio.db] [--json]`. Prompt caching —
  `app.enable_prompt_caching(ttl="5m|1h", min_prefix_tokens=1024)` (on by default
  via `provider_cache`/`provider_cache_ttl`/`provider_cache_min_prefix_tokens`
  config; `Message.cache_ttl`; model spans gain `cache_hit_rate` /
  `cached_input_tokens`). Live & sharded retrieval — `LiveIndex.upsert(chunks, *,
  ttl_seconds=)` / `upsert_stream(...)` → `UpsertStats(added, updated, unchanged,
  .reembedded)` (content-hash change detection — only changed chunks re-embed);
  `ShardedIndex(shards, *, router=, max_concurrency=8)` (Index protocol, chunks
  co-locate by `document_id` hash, parallel fan-out merges `top_k`). New
  `vincio.__all__` exports: `BatchRunner`, `CircuitBreaker`, `HealthAwareFailover`,
  `KeyPool`, `ModelCascade`, `CostLedger`, `CostBudget`, `BudgetManager`,
  `ShardedIndex`. Edge over gateways (LiteLLM/Bifrost/Portkey): same failover,
  circuit breaking, cascades, cost attribution, enforced budgets, batch — but
  in-process (no proxy hop), governed by the policy engine, on one trace.
