Metadata-Version: 2.4
Name: vincio
Version: 4.2.0
Summary: Vincio: the context engineering platform for AI applications. Compiles prompts, memory, retrieval, tools, schemas, and policies into optimized, validated, observable model-ready context packets.
Project-URL: Homepage, https://github.com/Ohswedd/vincio
Project-URL: Documentation, https://github.com/Ohswedd/vincio/tree/main/docs
Project-URL: Repository, https://github.com/Ohswedd/vincio
Project-URL: Issues, https://github.com/Ohswedd/vincio/issues
Project-URL: Changelog, https://github.com/Ohswedd/vincio/blob/main/CHANGELOG.md
Author: Vincio Contributors
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
           1. Definitions.
        
              "License" shall mean the terms and conditions for use, reproduction,
              and distribution as defined by Sections 1 through 9 of this document.
        
              "Licensor" shall mean the copyright owner or entity authorized by
              the copyright owner that is granting the License.
        
              "Legal Entity" shall mean the union of the acting entity and all
              other entities that control, are controlled by, or are under common
              control with that entity. For the purposes of this definition,
              "control" means (i) the power, direct or indirect, to cause the
              direction or management of such entity, whether by contract or
              otherwise, or (ii) ownership of fifty percent (50%) or more of the
              outstanding shares, or (iii) beneficial ownership of such entity.
        
              "You" (or "Your") shall mean an individual or Legal Entity
              exercising permissions granted by this License.
        
              "Source" form shall mean the preferred form for making modifications,
              including but not limited to software source code, documentation
              source, and configuration files.
        
              "Object" form shall mean any form resulting from mechanical
              transformation or translation of a Source form, including but
              not limited to compiled object code, generated documentation,
              and conversions to other media types.
        
              "Work" shall mean the work of authorship, whether in Source or
              Object form, made available under the License, as indicated by a
              copyright notice that is included in or attached to the work
              (an example is provided in the Appendix below).
        
              "Derivative Works" shall mean any work, whether in Source or Object
              form, that is based on (or derived from) the Work and for which the
              editorial revisions, annotations, elaborations, or other modifications
              represent, as a whole, an original work of authorship. For the purposes
              of this License, Derivative Works shall not include works that remain
              separable from, or merely link (or bind by name) to the interfaces of,
              the Work and Derivative Works thereof.
        
              "Contribution" shall mean any work of authorship, including
              the original version of the Work and any modifications or additions
              to that Work or Derivative Works thereof, that is intentionally
              submitted to Licensor for inclusion in the Work by the copyright owner
              or by an individual or Legal Entity authorized to submit on behalf of
              the copyright owner. For the purposes of this definition, "submitted"
              means any form of electronic, verbal, or written communication sent
              to the Licensor or its representatives, including but not limited to
              communication on electronic mailing lists, source code control systems,
              and issue tracking systems that are managed by, or on behalf of, the
              Licensor for the purpose of discussing and improving the Work, but
              excluding communication that is conspicuously marked or otherwise
              designated in writing by the copyright owner as "Not a Contribution."
        
              "Contributor" shall mean Licensor and any individual or Legal Entity
              on behalf of whom a Contribution has been received by Licensor and
              subsequently incorporated within the Work.
        
           2. Grant of Copyright License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              copyright license to reproduce, prepare Derivative Works of,
              publicly display, publicly perform, sublicense, and distribute the
              Work and such Derivative Works in Source or Object form.
        
           3. Grant of Patent License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              (except as stated in this section) patent license to make, have made,
              use, offer to sell, sell, import, and otherwise transfer the Work,
              where such license applies only to those patent claims licensable
              by such Contributor that are necessarily infringed by their
              Contribution(s) alone or by combination of their Contribution(s)
              with the Work to which such Contribution(s) was submitted. If You
              institute patent litigation against any entity (including a
              cross-claim or counterclaim in a lawsuit) alleging that the Work
              or a Contribution incorporated within the Work constitutes direct
              or contributory patent infringement, then any patent licenses
              granted to You under this License for that Work shall terminate
              as of the date such litigation is filed.
        
           4. Redistribution. You may reproduce and distribute copies of the
              Work or Derivative Works thereof in any medium, with or without
              modifications, and in Source or Object form, provided that You
              meet the following conditions:
        
              (a) You must give any other recipients of the Work or
                  Derivative Works a copy of this License; and
        
              (b) You must cause any modified files to carry prominent notices
                  stating that You changed the files; and
        
              (c) You must retain, in the Source form of any Derivative Works
                  that You distribute, all copyright, patent, trademark, and
                  attribution notices from the Source form of the Work,
                  excluding those notices that do not pertain to any part of
                  the Derivative Works; and
        
              (d) If the Work includes a "NOTICE" text file as part of its
                  distribution, then any Derivative Works that You distribute must
                  include a readable copy of the attribution notices contained
                  within such NOTICE file, excluding those notices that do not
                  pertain to any part of the Derivative Works, in at least one
                  of the following places: within a NOTICE text file distributed
                  as part of the Derivative Works; within the Source form or
                  documentation, if provided along with the Derivative Works; or,
                  within a display generated by the Derivative Works, if and
                  wherever such third-party notices normally appear. The contents
                  of the NOTICE file are for informational purposes only and
                  do not modify the License. You may add Your own attribution
                  notices within Derivative Works that You distribute, alongside
                  or as an addendum to the NOTICE text from the Work, provided
                  that such additional attribution notices cannot be construed
                  as modifying the License.
        
              You may add Your own copyright statement to Your modifications and
              may provide additional or different license terms and conditions
              for use, reproduction, or distribution of Your modifications, or
              for any such Derivative Works as a whole, provided Your use,
              reproduction, and distribution of the Work otherwise complies with
              the conditions stated in this License.
        
           5. Submission of Contributions. Unless You explicitly state otherwise,
              any Contribution intentionally submitted for inclusion in the Work
              by You to the Licensor shall be under the terms and conditions of
              this License, without any additional terms or conditions.
              Notwithstanding the above, nothing herein shall supersede or modify
              the terms of any separate license agreement you may have executed
              with Licensor regarding such Contributions.
        
           6. Trademarks. This License does not grant permission to use the trade
              names, trademarks, service marks, or product names of the Licensor,
              except as required for reasonable and customary use in describing the
              origin of the Work and reproducing the content of the NOTICE file.
        
           7. Disclaimer of Warranty. Unless required by applicable law or
              agreed to in writing, Licensor provides the Work (and each
              Contributor provides its Contributions) on an "AS IS" BASIS,
              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
              implied, including, without limitation, any warranties or conditions
              of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
              PARTICULAR PURPOSE. You are solely responsible for determining the
              appropriateness of using or redistributing the Work and assume any
              risks associated with Your exercise of permissions under this License.
        
           8. Limitation of Liability. In no event and under no legal theory,
              whether in tort (including negligence), contract, or otherwise,
              unless required by applicable law (such as deliberate and grossly
              negligent acts) or agreed to in writing, shall any Contributor be
              liable to You for damages, including any direct, indirect, special,
              incidental, or consequential damages of any character arising as a
              result of this License or out of the use or inability to use the
              Work (including but not limited to damages for loss of goodwill,
              work stoppage, computer failure or malfunction, or any and all
              other commercial damages or losses), even if such Contributor
              has been advised of the possibility of such damages.
        
           9. Accepting Warranty or Additional Liability. While redistributing
              the Work or Derivative Works thereof, You may choose to offer,
              and charge a fee for, acceptance of support, warranty, indemnity,
              or other liability obligations and/or rights consistent with this
              License. However, in accepting such obligations, You may act only
              on Your own behalf and on Your sole responsibility, not on behalf
              of any other Contributor, and only if You agree to indemnify,
              defend, and hold each Contributor harmless for any liability
              incurred by, or claims asserted against, such Contributor by reason
              of your accepting any such warranty or additional liability.
        
           END OF TERMS AND CONDITIONS
        
           APPENDIX: How to apply the Apache License to your work.
        
              To apply the Apache License to your work, attach the following
              boilerplate notice, with the fields enclosed by brackets "[]"
              replaced with your own identifying information. (Don't include
              the brackets!)  The text should be enclosed in the appropriate
              comment syntax for the file format. We also recommend that a
              file or class name and description of purpose be included on the
              same "printed page" as the copyright notice for easier
              identification within third-party archives.
        
           Copyright 2026 Vincio Contributors
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
License-File: LICENSE
Keywords: agents,ai,context-engineering,evaluation,llm,observability,prompts,rag,structured-output
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.5
Requires-Dist: pyyaml>=6.0
Requires-Dist: typing-extensions>=4.8
Provides-Extra: all
Requires-Dist: anthropic>=0.34; extra == 'all'
Requires-Dist: av>=12.0; extra == 'all'
Requires-Dist: boto3>=1.34; extra == 'all'
Requires-Dist: chromadb>=0.5; extra == 'all'
Requires-Dist: cryptography>=42.0; extra == 'all'
Requires-Dist: dspy-ai>=2.4; extra == 'all'
Requires-Dist: duckdb>=0.10; extra == 'all'
Requires-Dist: elasticsearch>=8.0; extra == 'all'
Requires-Dist: extract-msg>=0.48; extra == 'all'
Requires-Dist: fastapi>=0.110; extra == 'all'
Requires-Dist: fastembed>=0.3; extra == 'all'
Requires-Dist: google-cloud-bigquery>=3.0; extra == 'all'
Requires-Dist: google-cloud-storage>=2.16; extra == 'all'
Requires-Dist: google-genai>=0.3; extra == 'all'
Requires-Dist: haystack-ai>=2.0; extra == 'all'
Requires-Dist: lancedb>=0.10; extra == 'all'
Requires-Dist: langchain-core>=0.3; extra == 'all'
Requires-Dist: llama-cpp-python>=0.2; extra == 'all'
Requires-Dist: llama-index-core>=0.11; extra == 'all'
Requires-Dist: mistralai>=1.0; extra == 'all'
Requires-Dist: neo4j>=5.0; extra == 'all'
Requires-Dist: networkx>=3.0; extra == 'all'
Requires-Dist: openai>=1.40; extra == 'all'
Requires-Dist: opensearch-py>=2.4; extra == 'all'
Requires-Dist: opentelemetry-api>=1.24; extra == 'all'
Requires-Dist: opentelemetry-sdk>=1.24; extra == 'all'
Requires-Dist: pdfplumber>=0.11; extra == 'all'
Requires-Dist: pgvector>=0.2; extra == 'all'
Requires-Dist: pillow>=10.0; extra == 'all'
Requires-Dist: pinecone>=5.0; extra == 'all'
Requires-Dist: playwright>=1.40; extra == 'all'
Requires-Dist: psycopg[binary]>=3.1; extra == 'all'
Requires-Dist: pyarrow>=14.0; extra == 'all'
Requires-Dist: pydantic-settings>=2.2; extra == 'all'
Requires-Dist: pymilvus>=2.4; extra == 'all'
Requires-Dist: pypdf>=4.0; extra == 'all'
Requires-Dist: pypdfium2>=4.0; extra == 'all'
Requires-Dist: pytesseract>=0.3; extra == 'all'
Requires-Dist: python-docx>=1.1; extra == 'all'
Requires-Dist: python-pptx>=0.6; extra == 'all'
Requires-Dist: pyvespa>=0.40; extra == 'all'
Requires-Dist: qdrant-client>=1.9; extra == 'all'
Requires-Dist: rank-bm25>=0.2; extra == 'all'
Requires-Dist: redis>=5.0; extra == 'all'
Requires-Dist: reportlab>=4.0; extra == 'all'
Requires-Dist: sentence-transformers>=3.0; extra == 'all'
Requires-Dist: snowflake-connector-python>=3.0; extra == 'all'
Requires-Dist: sympy>=1.12; extra == 'all'
Requires-Dist: tiktoken>=0.7; extra == 'all'
Requires-Dist: transformers>=4.40; extra == 'all'
Requires-Dist: uvicorn>=0.29; extra == 'all'
Requires-Dist: weaviate-client>=4.5; extra == 'all'
Requires-Dist: websockets>=12.0; extra == 'all'
Requires-Dist: z3-solver>=4.12; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.34; extra == 'anthropic'
Provides-Extra: bigquery
Requires-Dist: google-cloud-bigquery>=3.0; extra == 'bigquery'
Provides-Extra: chroma
Requires-Dist: chromadb>=0.5; extra == 'chroma'
Provides-Extra: computer-use
Requires-Dist: playwright>=1.40; extra == 'computer-use'
Provides-Extra: cross-encoder
Requires-Dist: sentence-transformers>=3.0; extra == 'cross-encoder'
Provides-Extra: crypto
Requires-Dist: cryptography>=42.0; extra == 'crypto'
Provides-Extra: dev
Requires-Dist: coverage[toml]>=7.0; extra == 'dev'
Requires-Dist: mypy>=1.9; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: docx
Requires-Dist: python-docx>=1.1; extra == 'docx'
Provides-Extra: dspy
Requires-Dist: dspy-ai>=2.4; extra == 'dspy'
Provides-Extra: duckdb
Requires-Dist: duckdb>=0.10; extra == 'duckdb'
Provides-Extra: elasticsearch
Requires-Dist: elasticsearch>=8.0; extra == 'elasticsearch'
Provides-Extra: fastembed
Requires-Dist: fastembed>=0.3; extra == 'fastembed'
Provides-Extra: gcs
Requires-Dist: google-cloud-storage>=2.16; extra == 'gcs'
Provides-Extra: gen-docx
Requires-Dist: python-docx>=1.1; extra == 'gen-docx'
Provides-Extra: gen-pdf
Requires-Dist: reportlab>=4.0; extra == 'gen-pdf'
Provides-Extra: gen-pptx
Requires-Dist: python-pptx>=0.6; extra == 'gen-pptx'
Provides-Extra: gguf
Requires-Dist: llama-cpp-python>=0.2; extra == 'gguf'
Provides-Extra: google
Requires-Dist: google-genai>=0.3; extra == 'google'
Provides-Extra: graph
Requires-Dist: neo4j>=5.0; extra == 'graph'
Requires-Dist: networkx>=3.0; extra == 'graph'
Provides-Extra: haystack
Requires-Dist: haystack-ai>=2.0; extra == 'haystack'
Provides-Extra: lancedb
Requires-Dist: lancedb>=0.10; extra == 'lancedb'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.3; extra == 'langchain'
Provides-Extra: llamaindex
Requires-Dist: llama-index-core>=0.11; extra == 'llamaindex'
Provides-Extra: local-neural
Requires-Dist: fastembed>=0.3; extra == 'local-neural'
Requires-Dist: llama-cpp-python>=0.2; extra == 'local-neural'
Requires-Dist: sentence-transformers>=3.0; extra == 'local-neural'
Provides-Extra: milvus
Requires-Dist: pymilvus>=2.4; extra == 'milvus'
Provides-Extra: mistral
Requires-Dist: mistralai>=1.0; extra == 'mistral'
Provides-Extra: msg
Requires-Dist: extract-msg>=0.48; extra == 'msg'
Provides-Extra: ocr
Requires-Dist: pillow>=10.0; extra == 'ocr'
Requires-Dist: pypdfium2>=4.0; extra == 'ocr'
Requires-Dist: pytesseract>=0.3; extra == 'ocr'
Provides-Extra: openai
Requires-Dist: openai>=1.40; extra == 'openai'
Provides-Extra: opensearch
Requires-Dist: opensearch-py>=2.4; extra == 'opensearch'
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.24; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.24; extra == 'otel'
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == 'parquet'
Provides-Extra: pdf
Requires-Dist: pypdf>=4.0; extra == 'pdf'
Provides-Extra: pdf-layout
Requires-Dist: pdfplumber>=0.11; extra == 'pdf-layout'
Provides-Extra: pinecone
Requires-Dist: pinecone>=5.0; extra == 'pinecone'
Provides-Extra: postgres
Requires-Dist: pgvector>=0.2; extra == 'postgres'
Requires-Dist: psycopg[binary]>=3.1; extra == 'postgres'
Provides-Extra: realtime
Requires-Dist: websockets>=12.0; extra == 'realtime'
Provides-Extra: redis
Requires-Dist: redis>=5.0; extra == 'redis'
Provides-Extra: retrieval
Requires-Dist: qdrant-client>=1.9; extra == 'retrieval'
Requires-Dist: rank-bm25>=0.2; extra == 'retrieval'
Provides-Extra: s3
Requires-Dist: boto3>=1.34; extra == 's3'
Provides-Extra: server
Requires-Dist: fastapi>=0.110; extra == 'server'
Requires-Dist: pydantic-settings>=2.2; extra == 'server'
Requires-Dist: uvicorn>=0.29; extra == 'server'
Provides-Extra: snowflake
Requires-Dist: snowflake-connector-python>=3.0; extra == 'snowflake'
Provides-Extra: splade
Requires-Dist: transformers>=4.40; extra == 'splade'
Provides-Extra: tokenizers
Requires-Dist: tiktoken>=0.7; extra == 'tokenizers'
Provides-Extra: verify
Requires-Dist: sympy>=1.12; extra == 'verify'
Requires-Dist: z3-solver>=4.12; extra == 'verify'
Provides-Extra: vespa
Requires-Dist: pyvespa>=0.40; extra == 'vespa'
Provides-Extra: video
Requires-Dist: av>=12.0; extra == 'video'
Requires-Dist: pillow>=10.0; extra == 'video'
Provides-Extra: weaviate
Requires-Dist: weaviate-client>=4.5; extra == 'weaviate'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/banner.svg" alt="Vincio: the context engineering platform for AI applications" width="660">
</p>

<p align="center">
  <em>The scarce resource is not the model. It is the context you feed it.</em>
</p>

<p align="center">
  <a href="https://pypi.org/project/vincio/"><img src="https://img.shields.io/badge/vincio-4.0.0-B98B2E" alt="Vincio 4.0.0"></a>
  <a href="https://github.com/Ohswedd/vincio/actions/workflows/ci.yml"><img src="https://github.com/Ohswedd/vincio/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <img src="https://img.shields.io/pypi/pyversions/vincio?logo=python&logoColor=white" alt="Python 3.11+">
  <img src="https://img.shields.io/badge/license-Apache%202.0-4C6EF5" alt="Apache 2.0">
  <img src="https://img.shields.io/badge/tests-5858%20passing-2ea44f" alt="5858 tests passing">
  <img src="https://img.shields.io/badge/providers-OpenAI%20%C2%B7%20Anthropic%20%C2%B7%20Google%20%C2%B7%20Mistral%20%C2%B7%20local-B98B2E" alt="Providers: OpenAI, Anthropic, Google, Mistral, local, and OpenAI-compatible gateways">
</p>

---

**Vincio is a Python platform for building AI applications that you can trust in production.**
It takes everything that goes *into* a model (prompts, memory, retrieved evidence, tools, schemas,
and policies) and compiles it into an optimized, validated, observable **context packet**; then it
checks, measures, and traces everything that comes *out*. Named for **Leonardo da Vinci**,
it pairs engineering and craft in equal measure.

<p align="center">
  <img src="assets/pipeline.svg" alt="The run pipeline, governed end to end: raw input, normalize, redact and gate, retrieve and rank, compile context, call model, parse and validate, evaluate and guard, trace and cost, learn; with a governance layer across the whole run (policy and rails, PII redaction, injection defense, audit chain, EU AI Act, residency, cross-org)" width="840">
</p>

Most libraries help you *call* a model. Vincio governs the **boundary** between your application and
the model: what evidence is selected, how it is scored and budgeted, how the result is validated,
and what it cost. It runs on your model of choice across every major provider, with batching,
caching, failover, and cost tracking built in.

<p align="center">
  <img src="assets/why.svg" alt="Why Vincio: offline dev and CI (deterministic mock, no key, no cost); deterministic (security and validation in code, not model output); measured (every run traced and costed, eval-gated); one system (input to output, not a bag of utilities)" width="840">
</p>

<details>
<summary><b>Why you'd reach for it, in one line each</b></summary>

- **Runs on any model, offline when you want.** Call OpenAI, Anthropic, Google, Mistral, a local model, or any OpenAI-compatible gateway through one interface. No key yet? A deterministic mock runs the whole pipeline (retrieval, validation, evals, traces) for dev, tests, and CI, with no network and no cost.
- **Deterministic where it counts.** Security, permissions, and validation are enforced in code, never gated on model output. The same input compiles to the same packet.
- **Measured, not asserted.** Every run is traced and costed; every change can be gated by an eval suite before it ships.
- **One coherent system** from input to output, not a bag of utilities you wire together yourself.

</details>

## Contents

[Install](#install) · [Quickstart](#quickstart) · [What you can build](#what-you-can-build) ·
[Providers](#providers--models) · [Features](#features) · [Benchmarks](#benchmarks) ·
[How Vincio compares](#how-vincio-compares) · [Examples](#examples) · [CLI](#command-line) ·
[Architecture](#architecture) · [Docs](#documentation)

## Install

```bash
pip install vincio                  # core (the offline mock provider is built in)
pip install "vincio[openai]"        # + OpenAI    (also: anthropic, google, mistral)
pip install "vincio[chroma]"        # + a vector store (also: pinecone, lancedb, pgvector, …)
pip install "vincio[all]"           # every optional integration
```

Python 3.11+. The core depends only on `pydantic`, `httpx`, `pyyaml`, and `typing-extensions`;
every heavy integration (vector stores, OCR, server, OpenTelemetry, …) is an opt-in extra.

## Quickstart

```python
from vincio import ContextApp

app = ContextApp(name="docs_qa")
app.add_source("docs", path="./docs", retrieval="hybrid")
app.set_policy("answer_only_from_sources", True)

result = app.run("How do I configure SSO?")
print(result.output)      # the grounded answer
print(result.citations)   # the evidence it actually cited
print(result.trace_id)    # every run produces a full trace
print(result.cost_usd)    # …and a cost
```

To use a real model, set a provider and key, for example `export VINCIO_PROVIDER=openai
OPENAI_API_KEY=sk-...`, or pass `provider=` and `model=` to `ContextApp`. The same code runs against
OpenAI, Anthropic, Google, Mistral, a local model, or any OpenAI-compatible gateway. No key yet? Out
of the box it runs on a deterministic mock that emits schema-valid output, so you can build and test
the whole pipeline offline in CI.

## What you can build

**Typed output you can rely on**: declare a Pydantic schema, get a validated instance back:

```python
from pydantic import BaseModel
from vincio import ContextApp

class Triage(BaseModel):
    label: str
    confidence: float

app = ContextApp(name="triage", output_schema=Triage)
app.run("The dashboard crashes after login").output.label   # → a validated Triage
```

**Agents with tools, memory, and hard budgets**: permissioned tools, approval-gated writes, and a
loop that cannot run away:

```python
app = ContextApp(name="support", output_schema=RefundDecision)
app.add_memory(scope="user", strategy="semantic")
app.add_tool(lookup_order, permissions=["orders:read"])
app.add_tool(issue_refund, permissions=["refunds:write"], approval_required=True)
app.run("Refund my duplicate charge")
```

**Evaluation as a CI gate**: measure quality and block a regression before it ships:

```python
from vincio import Dataset
from vincio.evals import EvalCase, EvalRunner

dataset = Dataset(name="golden", cases=[EvalCase(id="c1", input="…", expected="…")])
runner = EvalRunner(app, metrics=["groundedness", "citation_accuracy"],
                    gates={"groundedness": ">= 0.8"})
report = runner.run(dataset)
assert all(g["passed"] for g in report.gates.values())   # fail the build on a regression
```

See **[Examples](#examples)** for twelve complete, runnable programs that cover the whole platform.

## Providers & models

Vincio calls real models in production. One interface routes to every major provider, with the
model-operations layer (reasoning control, half-cost batch, caching, failover, cost tracking) built
in. The deterministic mock is a development convenience, not the product: it lets you build and test
the whole pipeline with no key and no cost before you point it at a real model.

<p align="center">
  <img src="assets/providers.svg" alt="Providers and models: one interface over OpenAI, Anthropic, Google, Mistral, local models, and any OpenAI-compatible gateway, plus enterprise auth for Amazon Bedrock, Google Vertex, and Azure OpenAI. Model operations: unified reasoning control, batch at about half cost, prompt caching, circuit breaker and failover, key pool, and per-run cost tracking. With no key, a deterministic mock runs the whole pipeline for dev, tests, and CI." width="840">
</p>

<details>
<summary><b>Providers, model operations, and the mock</b></summary>

- **Providers**: OpenAI, Anthropic, Google (Gemini), Mistral, local models, and any OpenAI-compatible gateway (Groq, Together, Fireworks, OpenRouter, and the like) through one `ModelProvider` interface.
- **Enterprise auth**: Amazon Bedrock, Google Vertex, and Azure OpenAI via pluggable auth strategies (SigV4, service-account, Azure AD / key).
- **Model operations**: unified reasoning/thinking control across providers, batch backends (~50% cost), prompt-cache strategy, a circuit breaker with health-aware failover, a key pool, and a data-driven `ModelRegistry` (capabilities, pricing, lifecycle) that drives capability guards and shadow / canary dispatch.
- **The mock**: `MockProvider` is deterministic and emits schema-valid output, so the full pipeline (retrieval, validation, evals, traces, cost) runs offline in CI with no key and no cost. Use it for development and tests; use a real provider in production.

```python
# point an app at a real model (or set VINCIO_PROVIDER / the API key in the environment)
app = ContextApp(name="docs_qa", provider="openai", model="gpt-4o-mini")
```

</details>

## Features

Everything below is implemented, tested offline, and demonstrated by a runnable example. Use the
high-level `ContextApp`, or reach for any engine directly.

<p align="center">
  <img src="assets/features.svg" alt="One platform, every layer: context and prompts; retrieval and memory; agents and orchestration; output and evaluation; the closed loop; security and governance; protocols and interop; cross-org economy, edge and federated reach" width="840">
</p>

<details>
<summary><b>Every engine, in detail</b></summary>

**Context & prompts**
- Prompt compiler: typed prompt ASTs with `${variables}`, lint rules, cache-aware stable-prefix layout, versioning, hashing, and diffing.
- Context compiler: scores every candidate (relevance, novelty, authority, freshness, provenance, token cost, leakage risk), deduplicates, resolves conflicts, compresses, and packs to a token budget, with an *excluded-context report* explaining every omission.
- Tabular evidence: a typed, columnar `Dataset` and a deterministic `DataEncoder` that renders it header-once — schema, types, and units declared once, cells as delimited rows — lossless, columnar-accurate in token cost, and far cheaper than `json.dumps` or a Markdown table; `TableEvidence` scores and cites it like any other evidence.
- Dataset profiling & quality: `profile_dataset` computes a deterministic, bounded-memory column profile (cardinality, percentiles, histograms, null rate, exemplars); reservoir/stratified sampling stands a representative sample in for the whole; `fit_to_window` fits a table far larger than the window — profile plus sample — under a fixed token budget; and `DataQualityRails` screen for schema violations, constraint breaks, anomalies, and PII on the deterministic rail path.

**Retrieval & memory**
- Hybrid RAG: BM25 + dense + learned-sparse + late-interaction fused in one weighted RRF; query understanding (HyDE, multi-query, decomposition); sentence-window / auto-merging chunking; GraphRAG; structured metadata filters with tenant scope; text + image + table + video evidence as first-class scored candidates.
- Layered memory: session → episodic → semantic → tenant → graph, with a guarded write pipeline, confidence decay, contradiction resolution, bi-temporal recall, per-memory ACLs, and audited GDPR-style edit/forget/export.

**Agents & orchestration**
- Tools: permissioned registry (RBAC + ABAC), schema-from-typehints, a resource-limited sandbox, idempotent write guardrails with approval callbacks, and a grounded computer-use action plane.
- Agents: bounded DAG execution with planners (ReAct / plan-and-execute / hierarchical HTN), in-place plan repair, cost-aware action selection, and a budgeted deep-research agent.
- Orchestration: multi-agent crews with a shared blackboard, durable stateful graphs (checkpoint / resume / time-travel / human-in-the-loop), deterministic workflows, and a distributed durable-execution backend.

**Output, evaluation & observability**
- Structured output: Pydantic contracts, constrained decoding, streaming validation with early abort, bounded self-correction that repairs structure only (never invents facts), and DSPy-style typed signatures.
- Evaluation: golden datasets, 30+ metrics, deterministic / model / G-Eval judges, synthetic data, red-teaming, trajectory & tool-use scoring, drift detection, regression gates, and a `pytest` plugin.
- Observability: full trace span trees, OpenTelemetry export, a local trace viewer, a versioned prompt registry, and per-run cost tracking, no account or hosted backend required.

**The closed loop**
- Optimization: one reproducible cycle (trace → dataset → eval → optimize → promote): a reflective GEPA/MIPRO optimizer, a distillation flywheel, on-policy reinforcement from verifiable rewards, and gated deploy with canary + rollback. No promotion ships without clearing the gates.

**Security & governance**
- Security: deterministic PII / secret redaction (multilingual), prompt-injection defense and provable containment (taint tracking + capability tokens), RBAC / ABAC, tenant isolation, and a hash-chained, signed audit log with offline tamper verification.
- Governance: model / system cards, an OWASP / NIST / MITRE / ISO compliance matrix, an AI-BOM, provable erasure, a consent ledger, data-residency enforcement, formal invariant verification, agent identity & delegation, verified-reasoning certificates, and continuous assurance cases.

**Interop**
- Protocols: MCP (client *and* server), A2A agent-to-agent, and Agent Skills, all in-process.
- Ecosystem: import/export LangChain, LlamaIndex, Haystack, and DSPy assets; first-party data connectors; and any OpenAI-compatible model or vector store you already run.

**Reach further:** a cross-organization agent economy (negotiation, contracts, durable sagas, metering, settlement, arbitration, reputation, collateral & solvency proofs), an edge / WASM in-process runtime, on-device LoRA adaptation, federated learning with a differential-privacy accountant, and per-run energy / carbon accounting. See [`ROADMAP.md`](ROADMAP.md).

</details>

## Benchmarks

Three suites ship in [`benchmarks/`](benchmarks), all reproducible on your own machine. Every number
is measured live from both sides; a missing competitor is reported as skipped, never assumed.

### Head-to-head vs. real libraries

[`competitive.py`](benchmarks/competitive.py) runs Vincio against the *actual* library a team would
otherwise use (Apple Silicon, Python 3.13; ratios are the portable signal, not wall-clock).

<p align="center">
  <img src="assets/benchmark-headtohead.svg" alt="Head-to-head vs. real libraries: 30 to 40 times faster BM25 at 20k docs vs rank_bm25; 60 percent fewer tokens for the same answer vs LangChain and LlamaIndex; 1.4 to 1.8 times faster token counting vs tiktoken; 4 of 8 vs 1 of 8 malformed JSON recovered vs stdlib json" width="840">
</p>

<details>
<summary><b>Show the full table</b></summary>

| Operation | Vincio | Competitor | Result |
|---|---|---|---|
| BM25 query @ **20k docs** | `BM25Index` | `rank_bm25` | **~30–40× faster**: identical top-1 ranking |
| **Context assembly**: tokens sent for the same retrieved set | context compiler | LangChain `stuff` / LlamaIndex `compact` | **~60% fewer tokens**: answer retained |
| **Tabular encoding**: tokens for a 50×5 table | `DataEncoder` | `json.dumps` / `pandas.to_markdown` / TOON | **~66% fewer tokens** than `json.dumps`, lossless, typed schema |
| **Fit a 5k-row table into the window** | `fit_to_window` | `json.dumps` all rows / `pandas.describe` | **~99% fewer tokens**: profile + representative sample, size invariant to row count |
| Text chunking a 24k-word doc | `chunk_document` | LangChain / LlamaIndex splitters | **fastest**, chunks carry provenance |
| Token counting (~60k words) | `HeuristicTokenCounter` | `tiktoken` | **~1.4–1.8× faster**, zero-dependency, conservative |
| Malformed-JSON recovery | lenient parser | stdlib `json.loads` | **4/8 vs 1/8** recovered |
| Render with a missing variable | `PromptSpec.substitute` | `jinja2` | typed error vs. silently-empty render |

`rank_bm25` rescans every document per query; Vincio's inverted index only scans documents
containing a query term, so its lead grows with corpus size. The point isn't that every component
beats every specialist: a dedicated JSON-repair library recovers more than Vincio (by guessing,
which is unsafe for typed extraction). Vincio's edge is an **integrated, correct, governed**
pipeline, not a pile of single-purpose libraries.

</details>

### Orchestrator uplift: the same model, through Vincio

[`quality_uplift.py`](benchmarks/quality_uplift.py) measures what routing a model *through* Vincio
adds versus calling it directly, against real models on 15 company-specific policy questions a model
cannot know from pretraining (**4 models × 3 runs = 360 live calls**, OpenRouter, June 2026).

<p align="center">
  <img src="assets/benchmark-uplift.svg" alt="Grounded-answer accuracy, direct vs. through Vincio: gpt-4o-mini 2 to 100 percent; claude-3-haiku 0 to 91 percent; gemini-2.5-flash-lite 4 to 98 percent; llama-3.1-8b 2 to 89 percent; aggregate 2 to 95 percent" width="840">
</p>

<details>
<summary><b>Show the numbers and the honest read</b></summary>

**Deterministic mechanism metrics** (mechanical, so they hold for any model and run offline):

| Same model: direct vs. via Vincio | Direct | Via Vincio |
|---|--:|--:|
| Schema-valid object from realistic model outputs | 1/6 | **5/6** |
| Prompt-injection exfiltration via a tool call | compromised | **contained** |
| Context tokens to keep an early fact at 160 turns | 1,267 (lost) | **33 (retained)** |

**Grounded-answer quality on real models** (mean over runs, stochastic by a point or two):

| Model: direct vs. through Vincio | Direct correct | **Via Vincio correct** | Direct hallucinated | Cost per *correct* answer |
|---|--:|--:|--:|:--|
| `openai/gpt-4o-mini` | 2% | **100%** | 64% | **~62× cheaper** via Vincio |
| `anthropic/claude-3-haiku` | 0% | **91%** | 2%¹ | direct **never** correct (∞) |
| `google/gemini-2.5-flash-lite` | 4% | **98%** | 29% | **~67× cheaper** via Vincio |
| `meta-llama/llama-3.1-8b-instruct` | 2% | **89%** | 40% | **~29× cheaper** via Vincio |
| **Aggregate** | **2%** | **95%** | n/a | n/a |

<sub>¹ claude-3-haiku *abstains* (98% of the time) rather than guessing; better-aligned models say "I don't know," weaker ones confidently fabricate. Either way the model *alone* answers ~2%; the same model through Vincio's retrieval + grounding answers 89–100%, every answer cited.</sub>

The cost line is the honest punchline: a direct call is cheaper *per call*, but it answers almost
nothing correctly, so its cost **per correct answer** is 29–67× higher, or undefined when the model
gets *nothing* right on its own. Vincio is also faster per answer here (~1.3–1.6 s vs. ~1.7–2.5 s),
and token usage is roughly a wash. Full per-metric breakdown is in
[`benchmarks/README.md`](benchmarks/README.md). Reproduce with `VINCIO_PROVIDER=openrouter … python
benchmarks/quality_uplift.py`.

</details>

### VincioBench: the internal regression suite

[`vinciobench.py`](benchmarks/vinciobench.py) is **not a competitive claim**: it is the deterministic
mechanism suite that gates CI. Its families assert that each engine still *works* on a bundled
synthetic corpus, so a regression fails the build. The scores saturate by design (a small corpus
built to exercise each mechanism), which proves *the mechanism is intact*, not real-world
performance. The credible performance evidence is the two sections above.

## How Vincio compares

Each ecosystem below is strong in its focus area. This reflects **built-in, in-library** capability,
not what's reachable by adding a separate product or SaaS.

<p align="center">
  <img src="assets/compare-matrix.svg" alt="Capability matrix comparing Vincio, LangChain, LlamaIndex, DSPy, and Ragas across twelve capabilities including the scored context compiler, sparse and late-interaction and GraphRAG fusion, layered memory, permissioned tools, durable graphs, structure-only repair, built-in evals and CI gates, eval-driven optimization, native tracing and cost, deterministic security, MCP and A2A and Skills, and governance evidence. Vincio is first-class across all twelve." width="840">
</p>

<details>
<summary><b>Show the full matrix</b></summary>

| Capability | **Vincio** | LangChain | LlamaIndex | DSPy | Ragas |
|---|:--:|:--:|:--:|:--:|:--:|
| Scored, budgeted **context compiler** | ✅ | ➖ | ➖ | ❌ | ❌ |
| **Sparse + late-interaction + GraphRAG** in one fusion | ✅ | ➖ | ➖ | ❌ | ❌ |
| Layered **memory** (decay, conflicts, bi-temporal) | ✅ | ➖ | ➖ | ❌ | ❌ |
| **Permissioned** tool registry (RBAC/ABAC) | ✅ | ❌ | ❌ | ❌ | ❌ |
| **Durable graphs** + bounded crews | ✅ | ➖ | ❌ | ❌ | ❌ |
| Structured output + **structure-only repair** | ✅ | ➖ | ➖ | ✅ | ❌ |
| Built-in **evals + CI gates** | ✅ | ➖ | ➖ | ➖ | ✅ |
| Eval-driven **optimization** (gated promotion) | ✅ | ❌ | ❌ | ✅ | ❌ |
| Native **tracing + cost**, no account | ✅ | ➖ | ➖ | ❌ | ❌ |
| **Deterministic security** (PII / injection / audit) | ✅ | ❌ | ❌ | ❌ | ❌ |
| **MCP** client *and* server + **A2A** + **Skills** | ✅ | ➖ | ➖ | ➖ | ❌ |
| **Governance evidence** (cards · AI-BOM · erasure · residency) | ✅ | ❌ | ❌ | ❌ | ❌ |

✅ first-class in-library · ➖ partial or via an add-on/SaaS · ❌ not a focus. Ecosystems evolve, and
Vincio is built to *interoperate*: `vincio.interop` brings LangChain, LlamaIndex, Haystack, and DSPy
assets in (and hands Vincio's back). See the in-depth write-ups in
[`docs/comparisons/`](docs/comparisons).

</details>

## Examples

Twelve complete, heavily-commented programs in [`examples/`](examples); each runs **fully offline**
and teaches a whole theme end to end.

| # | Example | What it covers |
|--|---|---|
| 01 | [`quickstart`](examples/01_quickstart.py) | typed output · grounded QA with citations · trace & cost · a short conversation |
| 02 | [`retrieval_rag`](examples/02_retrieval_rag.py) | hybrid + sparse + late-interaction fusion · query understanding · GraphRAG · multimodal evidence |
| 03 | [`memory`](examples/03_memory.py) | scoped remember/recall · bi-temporal · decay & contradictions · GDPR forget/export |
| 04 | [`agents_and_tools`](examples/04_agents_and_tools.py) | permissioned tools · sandbox · planners · plan repair · deep research · computer-use |
| 05 | [`orchestration`](examples/05_orchestration.py) | crews + blackboard · durable graphs · workflows · distributed execution |
| 06 | [`structured_output`](examples/06_structured_output.py) | contracts · constrained decoding · streaming validation · self-correction · signatures |
| 07 | [`evaluation_observability`](examples/07_evaluation_observability.py) | datasets · metrics · judges · red-team · drift · tracing · prompt registry |
| 08 | [`optimization_self_improvement`](examples/08_optimization_self_improvement.py) | the closed loop · reflective optimizer · RLVR · canary deploy · local & federated adaptation |
| 09 | [`security_governance`](examples/09_security_governance.py) | PII/injection/containment · audit · governance evidence · identity · verified reasoning · assurance |
| 10 | [`interop_and_protocols`](examples/10_interop_and_protocols.py) | MCP client+server · A2A · Agent Skills · framework interop · connectors · packs |
| 11 | [`advanced_context`](examples/11_advanced_context.py) | reasoning control · test-time compute · long-horizon · world-model · semantic cache · record-replay |
| 12 | [`cross_org_economy`](examples/12_cross_org_economy.py) | negotiation · contracts · durable sagas · settlement · arbitration · solvency proofs |
| 13 | [`tabular_evidence`](examples/13_tabular_evidence.py) | typed columnar `Dataset` · the compact, lossless `DataEncoder` · columnar token cost · `TableEvidence` in the compiler |
| 14 | [`dataset_profiling`](examples/14_dataset_profiling.py) | `profile_dataset` · reservoir/stratified sampling · `fit_to_window` under a token budget · `DataQualityRails` screening |

```bash
cd examples && python 01_quickstart.py            # offline, no keys
export VINCIO_PROVIDER=openai OPENAI_API_KEY=sk-... && python 01_quickstart.py   # against a real model
```

## Command line

```bash
vincio init my-project --template rag   # scaffold config + app + golden set
vincio run app.py --input "..."         # run an app
vincio eval run golden.jsonl            # run an eval suite with CI gates + baseline compare
vincio trace view trace_123             # TUI trace tree with scores + feedback
vincio optimize run --target groundedness
vincio loop run --app app.py --gate groundedness=">= 0.8"   # one closed-loop cycle
vincio audit verify                     # verify the audit-log hash chain offline
vincio mcp serve app.py                 # expose an app as an MCP server
vincio serve --app app.py               # launch the HTTP API (health/readiness/metrics)
```

The full CLI is in the [CLI reference](docs/reference/cli.md). `vincio serve` launches a FastAPI
server (API-key + JWT auth, SSE streaming, Prometheus metrics); `from vincio.server import
create_app` embeds it.

## Architecture

One coherent pipeline from raw input to traced, validated result: the input engine normalizes and
scopes the request; memory, retrieval, tools, and the prompt compiler all feed the **context
compiler**, which scores, deduplicates, resolves conflicts, compresses, and budgets; the model runs
provider-neutral; and every output is validated, evaluated, secured, traced, costed, and written
back to memory.

<p align="center">
  <img src="assets/architecture.svg" alt="Vincio architecture: the input engine feeds the context compiler, which is also fed by memory, retrieval, tools, and the prompt compiler; the context compiler feeds provider-neutral model execution; the output is validated, evaluated, secured, traced, costed, and written back to memory" width="840">
</p>

See [`AGENTS.md`](AGENTS.md) for the package layout and [`docs/concepts/`](docs/concepts) for a tour
of each engine.

## Status

Vincio 4.0 is **feature-complete and in long-term support**. The public API is frozen under
[Semantic Versioning](https://semver.org/spec/v2.0.0.html) with a mechanical
[deprecation policy](docs/reference/stability.md); performance and quality targets are
[published as SLOs](docs/reference/slo.md) and gated by VincioBench; releases ship a CycloneDX SBOM
with SLSA provenance. New capabilities are added behind opt-in extras, never by breaking working
code. See [`ROADMAP.md`](ROADMAP.md) and [`MIGRATION.md`](MIGRATION.md).

Vincio is, and stays, a **library**. The building blocks for production (audit chain, retention,
tenant isolation, RBAC/ABAC, a server) ship in the package for you to deploy on your own
infrastructure. There is no hosted service.

## Documentation

The [documentation index](docs/README.md) maps every guide, concept, and reference page in a
reading order. Highlights:

- **[Getting started](docs/getting-started.md)**: install, your first app, offline development
- **Concepts**: [context packets](docs/concepts/context-packets.md) ·
  [prompt compiler](docs/concepts/prompt-compiler.md) · [memory](docs/concepts/memory.md) ·
  [retrieval](docs/concepts/retrieval.md) · [agents & workflows](docs/concepts/agents.md) ·
  [evaluation](docs/concepts/evals.md) · [observability](docs/concepts/observability.md)
- **Guides**: [build a RAG app](docs/guides/build-rag-app.md) ·
  [structured output](docs/guides/structured-output.md) ·
  [add tools](docs/guides/add-tools.md) ·
  [orchestrate multi-agent systems](docs/guides/orchestrate-agents.md) ·
  [run evals](docs/guides/run-evals.md) · [close the loop](docs/guides/close-the-loop.md) ·
  [performance & streaming](docs/guides/performance.md) · [integrations](docs/guides/integrations.md)
- **Protocols**: [MCP client + server](docs/guides/mcp.md) · [A2A](docs/guides/a2a.md) ·
  [Agent Skills](docs/guides/agent-skills.md) · [reasoning control](docs/guides/reasoning.md)
- **Migrating**: from [LangChain](docs/guides/migrate-from-langchain.md) ·
  [LlamaIndex](docs/guides/migrate-from-llamaindex.md) · [Ragas](docs/guides/migrate-from-ragas.md)
- **Security & governance**: [threat model](docs/security/threat-model.md) ·
  [security policy](SECURITY.md) · [governance & compliance](docs/guides/governance.md)
- **Reference**: [API](docs/reference/api.md) · [CLI](docs/reference/cli.md) ·
  [config](docs/reference/config.md) · [SLOs](docs/reference/slo.md) ·
  [stability & deprecation](docs/reference/stability.md)
- **Comparisons**: [LangChain](docs/comparisons/langchain.md) ·
  [LlamaIndex](docs/comparisons/llamaindex.md) · [DSPy](docs/comparisons/dspy.md) ·
  [CrewAI](docs/comparisons/crewai.md) · [Ragas](docs/comparisons/ragas.md) ·
  [and more](docs/comparisons)

## Contributing

Contributions are welcome. The test suite runs fully offline and must stay green:

```bash
pip install -e ".[dev]"
python -m pytest -q          # 5858 tests, no network or API keys required
ruff check vincio/ tests/
mypy vincio
```

See [`AGENTS.md`](AGENTS.md) for the codebase layout and engineering conventions.

## License

[Apache License 2.0](LICENSE) © Vincio Contributors.
