Metadata-Version: 2.4
Name: contractforge-core
Version: 0.1.0
Summary: Platform-neutral semantic core for contract-first data ingestion.
Project-URL: Homepage, https://github.com/marquesantero/contractforge-core
Project-URL: Documentation, https://marquesantero.github.io/contractforge-core/
Project-URL: Repository, https://github.com/marquesantero/contractforge-core
Project-URL: Issues, https://github.com/marquesantero/contractforge-core/issues
Project-URL: Changelog, https://github.com/marquesantero/contractforge-core/blob/main/CHANGELOG.md
Author: ContractForge contributors
License: MIT
License-File: LICENSE
Keywords: contractforge,data-contracts,data-platform,ingestion,lakehouse
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Requires-Dist: eval-type-backport>=0.2
Requires-Dist: pydantic>=2.7
Requires-Dist: pyyaml>=6
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <img src="docs/assets/logo/contractforge-logo.png" alt="ContractForge" width="520">
</p>

# ContractForge

**Define ingestion intent once. Run it natively anywhere.**

<p align="center">
  <a href="https://github.com/marquesantero/contractforge-core/actions/workflows/ci.yml"><img alt="CI" src="https://github.com/marquesantero/contractforge-core/actions/workflows/ci.yml/badge.svg"></a>
  <a href="https://github.com/marquesantero/contractforge-core"><img alt="Product" src="https://img.shields.io/badge/product-ContractForge-0B5FFF"></a>
  <a href="https://marquesantero.github.io/contractforge-core/"><img alt="Documentation" src="https://img.shields.io/badge/docs-online-2EA44F"></a>
  <a href="https://github.com/marquesantero/contractforge-core/tree/main/src/contractforge_core"><img alt="Core" src="https://img.shields.io/badge/core-contractforge--core-1F6FEB"></a>
  <a href="https://github.com/marquesantero/contractforge-core/tree/main/adapters/databricks"><img alt="Databricks adapter" src="https://img.shields.io/badge/adapter-databricks-FF7A00"></a>
  <a href="https://github.com/marquesantero/contractforge-core/tree/main/adapters/aws"><img alt="AWS adapter" src="https://img.shields.io/badge/adapter-aws-232F3E"></a>
  <a href="https://github.com/marquesantero/contractforge-core/tree/main/ai"><img alt="ContractForge AI" src="https://img.shields.io/badge/ai-contractforge--ai-7B3FE4"></a>
  <img alt="Python" src="https://img.shields.io/badge/python-%3E%3D3.10-blue">
  <a href="LICENSE"><img alt="License" src="https://img.shields.io/github/license/marquesantero/contractforge-core"></a>
</p>

<p align="center">
  <a href="https://marquesantero.github.io/contractforge-core/">Documentation</a>
  ·
  <a href="docs/quickstart.md">Quick Start</a>
  ·
  <a href="docs/adapters.md">Adapters</a>
  ·
  <a href="ai/README.md">ContractForge AI</a>
  ·
  <a href="docs/roadmap.md">Roadmap</a>
</p>

ContractForge is a multi-runtime, contract-first ingestion platform. It turns
governed ingestion intent into native platform execution and evidence while
keeping the contract vocabulary stable across Databricks, AWS and future
adapters.

The product remains **ContractForge**. `contractforge-core`,
`contractforge-databricks`, `contractforge-aws` and `contractforge-ai` are
functional package boundaries, not separate products.

It is built for data consultants, platform teams and engineering groups that
need repeatable governed ingestion across different client runtimes without
rewriting the framework for every platform.

<p align="center">
  <img src="docs/assets/diagrams/contractforge-flow.svg" alt="ContractForge flow from contract to semantic core, capability matcher, platform adapter and native artifacts" width="900">
</p>

## Why ContractForge

| Capability | What it means |
| --- | --- |
| Contract-first ingestion | Source, target, write mode, schema policy, transforms, quality, access, operations and evidence live in reviewed YAML contracts. |
| Honest portability | The planner returns `SUPPORTED`, `SUPPORTED_WITH_WARNINGS`, `REVIEW_REQUIRED` or `UNSUPPORTED`; it does not silently downgrade semantics. |
| Native adapters | Databricks and AWS translate the same intent into native runtime behavior instead of forcing a lowest-common-denominator engine. |
| Evidence as product surface | Runs, errors, quality, quarantine, schema changes, lineage, governance actions and cost signals are tracked consistently. |
| Reusable connections | Shared `connection.yaml` files centralize connector defaults; ingestion contracts override only dataset-specific fields. |
| AI-assisted project design | ContractForge AI turns prompts and schemas into reviewable projects, then validates them through Core and adapter planners. |

ContractForge is not a scheduler, a dbt replacement, a closed ingestion runtime
or a universal Spark wrapper. It is the semantic contract and adapter layer for
repeatable governed ingestion.

## How It Works

```text
Contract YAML
  -> Semantic Core
  -> Capability Matcher
  -> Abstract Execution Plan
  -> Platform Adapter
  -> Native Runtime + Evidence
```

The core owns portable semantics. Adapters own platform behavior.
The core does not import Spark, Databricks SDK, boto3, Azure SDK, Fabric SDK or Snowflake clients.

## See It In 30 Seconds

```yaml
source:
  type: incremental_files
  path: s3://landing/orders
  format: json

target:
  catalog: main
  schema: bronze
  table: orders

mode: scd0_append
schema_policy: additive_only
quality_rules:
  not_null: [order_id]
```

Core planning result:

```text
SUPPORTED
```

The Databricks adapter may render Delta/Auto Loader/Asset Bundle artifacts. The
AWS adapter may render and deploy Glue Spark/Iceberg artifacts. Another adapter
may return `SUPPORTED_WITH_WARNINGS`, `REVIEW_REQUIRED` or `UNSUPPORTED` if it
cannot preserve the same semantics safely.

## Status And Roadmap

| Area | Status | Notes |
| --- | --- | --- |
| Core semantic model | Active | Contract models, semantic normalization, capability matching, abstract planning and evidence models are implemented. |
| Databricks adapter | Reference implementation | Delta, Unity Catalog, Auto Loader, Lakeflow planning, Asset Bundles, control tables, quality, governance, lineage, cost and dashboards are implemented inside the adapter boundary. |
| AWS adapter | Alpha with real E2E validation | Glue Spark/Iceberg planning, source support, quality/evidence, Lake Formation review/apply helpers, annotations, operations, S3 artifact publication, one-command Glue deployment and Glue job helper APIs are implemented. |
| ContractForge AI | Active | Deterministic review, project generation, diagnostics, provider routing and optional model-backed enrichment over the same core contract semantics. |
| Snowflake adapter | Alpha with real Snowflake validation | SQL warehouse runtime, hosted Snowpark procedure library runner, table/staged-file/SQL sources, write modes, quality, schema policy, governance, evidence, lineage and cost reconciliation are implemented and live-smoked. |
| Fabric adapter | Planned | Future adapters must depend on the core and declare platform capabilities explicitly. |

See [roadmap](docs/roadmap.md) for adapter maturity and release criteria.

## Compared With Alternatives

| Alternative | Difference |
| --- | --- |
| dbt | dbt models data after it lands. ContractForge defines how governed data arrives, is written, validated and evidenced. |
| Airbyte/Fivetran | They provide managed ingestion runtimes. ContractForge provides the contract and lets adapters execute natively in your platform. |
| Data contract tools | Validation is one slice. ContractForge covers source, write semantics, schema policy, quality, governance, evidence and native execution artifacts. |
| Platform-specific frameworks | ContractForge keeps platform implementations in adapters so the same semantics can be evaluated for other runtimes. |

## Install

From GitHub:

```bash
pip install "git+https://github.com/marquesantero/contractforge-core.git"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/databricks"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/aws"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/snowflake"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=ai"
```

Local development:

```bash
uv sync --all-extras
uv run pytest
```

Build wheels independently:

```bash
uv build --wheel
cd adapters/databricks && uv build --wheel
cd ../aws && uv build --wheel
cd ../snowflake && uv build --wheel
cd ../../ai && uv build --wheel
```

Release package names:

```bash
pip install contractforge-core contractforge-databricks contractforge-aws contractforge-snowflake contractforge-ai
```

## Project Shape

A complete ContractForge project keeps runtime concerns separate from contract
semantics:

```text
project.yaml
environments/
  databricks.environment.yaml
  aws.environment.yaml
connections/
  supabase.yaml
contracts/
  bronze/
    b_products/
      b_products.ingestion.yaml
      b_products.annotations.yaml
      b_products.operations.yaml
      b_products.access.yaml
```

Example shared connection:

```yaml
source:
  type: connector
  connector: postgres
  system: supabase
  options:
    url: "{{ secret:supabase/jdbc_url }}"
auth:
  type: basic
  username: "{{ secret:supabase/user }}"
  password: "{{ secret:supabase/password }}"
read:
  fetchsize: 20000
```

Example ingestion override:

```yaml
source:
  type: connection
  connection_path: project://connections/supabase.yaml
  table: public.products
  read:
    partition_column: product_id
    num_partitions: 8
```

The core resolves the connection before adapters plan or execute. Ingestion
values override global connection defaults.

## Platform Adapters

| Adapter | Package | Status | Native responsibilities |
| --- | --- | --- | --- |
| Databricks | `contractforge-databricks` | Reference implementation | Delta, Unity Catalog, Auto Loader, Lakeflow planning, Jobs, Asset Bundles, control tables, governance, lineage, cost and dashboards. |
| AWS | `contractforge-aws` | Alpha with real E2E validation | Glue Spark, Iceberg, Glue Catalog, Lake Formation review/apply helpers, S3 artifacts, Glue jobs, Athena/Iceberg evidence and cost records. |
| Fabric | `contractforge-fabric` | Planned | OneLake, Lakehouse tables, Data Pipelines, Dataflow Gen2 and Purview/Fabric metadata. |
| Snowflake | `contractforge-snowflake` | Alpha with real Snowflake validation | SQL warehouse runtime, hosted Snowpark procedure library runner with staged ZIP imports, table/staged-file/SQL sources, append/overwrite/upsert/hash-diff writes, quality, schema policy, governance, evidence/control tables, lineage, cost reconciliation and project deployment. Task graph live smoke still needs task grants. See [Snowflake adapter guide](docs/adapters/snowflake.md). |

Use the same project model for adapter deployment:

```bash
contractforge-databricks deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml --target dev
contractforge-aws deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml --dry-run --summary-only
```

## ContractForge AI

ContractForge AI is the planning and review companion. It can generate project
scaffolds from prompts and schemas, validate project folders, compare adapter
planning and produce clear HTML approval reports.

```bash
contractforge-ai guided-project \
  --intent "Create a Supabase medallion project for AWS and Databricks daily at 6 Sao Paulo time." \
  --schema schemas/products.json \
  --target contractforge-yaml \
  --allow-review-required \
  --output-dir generated/supabase

contractforge-ai validate-project-structure generated/supabase \
  --adapter databricks \
  --adapter aws \
  --format html > generated/supabase/project_validation.html
```

Model providers are optional. Deterministic validation and adapter planners
remain the source of truth; providers can explain or enrich, but they cannot
invent support status.

## Core Planning Example

```python
from contractforge_core.capabilities import PlatformCapabilities
from contractforge_core.contracts import semantic_contract_from_mapping, validate_contract
from contractforge_core.planner import plan_contract

contract = validate_contract(
    {
        "source": {"type": "incremental_files", "path": "s3://landing/orders", "format": "json"},
        "target": {"catalog": "main", "schema": "bronze", "table": "orders"},
        "mode": "scd0_append",
        "schema_policy": "additive_only",
        "quality_rules": {"not_null": ["order_id"]},
    }
)

semantic = semantic_contract_from_mapping(contract)
capabilities = PlatformCapabilities(
    platform="example",
    supports_append=True,
    supports_overwrite=True,
    supports_merge=False,
    evidence_stores=("audit_tables",),
)

result = plan_contract(semantic, capabilities)
print(result.status)
```

## Package Boundaries

| Layer | Package | Responsibility |
| --- | --- | --- |
| Semantic core | `contractforge-core` | Contract models, validation, semantic normalization, capability matching, abstract plans, portability diagnostics and neutral evidence models. |
| Databricks adapter | `contractforge-databricks` | Databricks capabilities, rendering, runtime execution, governance, evidence filling and deployment helpers. |
| AWS adapter | `contractforge-aws` | AWS capabilities, Glue/Iceberg planning, runtime helpers, S3 publication, deployment helpers and evidence filling. |
| AI companion | `contractforge-ai` | Deterministic review, project generation, diagnostics, provider routing, report generation and optional model-backed enrichment. |

Publication stays split: each package builds its own wheel and future adapters
depend explicitly on `contractforge-core`.

The core wheel owns only `contractforge_core`; adapter wheels such as `contractforge-databricks` own their adapter package and depend explicitly on the core.

See [publication packaging](docs/specs/publication-packaging.md).

## Documentation

| Topic | Link |
| --- | --- |
| Online site | [marquesantero.github.io/contractforge-core](https://marquesantero.github.io/contractforge-core/) |
| Documentation index | [docs/README.md](docs/README.md) |
| Quick start | [docs/quickstart.md](docs/quickstart.md) |
| Architecture | [docs/architecture.md](docs/architecture.md) |
| Contracts | [docs/contracts.md](docs/contracts.md) |
| Project YAML | [docs/project-yaml.md](docs/project-yaml.md) |
| Connection YAML | [docs/connection-yaml.md](docs/connection-yaml.md) |
| Adapters | [docs/adapters.md](docs/adapters.md) |
| Databricks adapter | [docs/databricks.md](docs/databricks.md) |
| AWS adapter | [docs/adapters/aws.md](docs/adapters/aws.md) |
| Test contracts across adapters | [docs/adapters/test-contracts-across-adapters.md](docs/adapters/test-contracts-across-adapters.md) |
| Connectors | [docs/connectors.md](docs/connectors.md) |
| Operations and evidence | [docs/operations.md](docs/operations.md) |
| ContractForge AI | [ai/README.md](ai/README.md) |
| Security | [docs/security.md](docs/security.md) |
| Adapter authoring | [docs/specs/adapter-authoring.md](docs/specs/adapter-authoring.md) |

Architecture contracts live under [docs/specs](docs/specs/), and decisions live
under [docs/adrs](docs/adrs/).

## Non-Goals

ContractForge is not:

- a scheduler;
- a universal Spark wrapper;
- a replacement for Databricks, Glue, Fabric, Snowflake or other runtimes;
- a promise that every contract runs everywhere;
- a dbt replacement;
- an orchestration engine;
- a GUI product in the core.

## License

MIT. See [LICENSE](LICENSE).
