Metadata-Version: 2.4
Name: cassetteql
Version: 0.1.1
Summary: S3-native cassette substrate for typed knowledge graphs: Dempster-Shafer reasoner, sheaf GNN, Kan-based schema migration. No GPU, 17MB SPLADE bundled.
License-Expression: Apache-2.0
Keywords: knowledge-graph,dempster-shafer,sheaf-gnn,situation-semantics,category-theory,ontology,infon,splade,calibrated-reasoning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: transformers>=4.40
Requires-Dist: numpy>=1.24
Requires-Dist: pyarrow>=15
Requires-Dist: fsspec>=2024.1
Provides-Extra: s3
Requires-Dist: s3fs>=2024.1; extra == "s3"
Provides-Extra: agent
Requires-Dist: strands-agents>=1.0; extra == "agent"
Provides-Extra: aws
Requires-Dist: boto3>=1.28; extra == "aws"
Requires-Dist: s3fs>=2024.1; extra == "aws"
Provides-Extra: all
Requires-Dist: boto3>=1.28; extra == "all"
Requires-Dist: s3fs>=2024.1; extra == "all"
Requires-Dist: strands-agents>=1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: jupyter; extra == "dev"
Requires-Dist: nbconvert; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: hypothesis; extra == "dev"
Dynamic: license-file

# cassetteql

**S3-native knowledge graphs with calibrated reasoning.** Immutable cassette files, split Parquet indexes, a Dempster–Shafer reasoner that tells you when it doesn't know, a trained sheaf GNN prior, and Kan-based schema migration. The bundled SPLADE-tiny model (17 MB) means no GPU, no model download, no API keys.

```
  ┌──────────────────────────────────┐
  │  ▓▓      .inf  cassette      ▓▓  │  one cassette =
  │  ●                            ●  │  one ingest batch
  │  ╲╲─────────────────────────╱╱   │
  │   header  ·  records  ·  footer  │
  │     │         │           │      │
  │     │         │           └─ JSON: offsets + stats
  │     │         └─ gzip frames (range-addressable)
  │     └─ schema_ref · created_at
  └──────────────────────────────────┘
      immutable · content-addressed · S3-native
```

```bash
pip install cassetteql
```

Imports as `cognition` (same pattern as `pillow`/`PIL`):

```python
from cognition.cassette import InfonStore, Query, Analyst
```

## One-minute start

```python
from cognition.cassette import InfonStore, Query

store = InfonStore("./data/chips", schema_path="schema.json")

# Delta ingest — idempotent, reports coverage diagnostics for free.
result = store.ingest(documents)
print(result["report"].summary())

# Calibrated single-claim verdict.
v = store.ask(Query().where(subject="toyota",
                             predicate="invest",
                             object="solid_state"))
print(v.label, v.mass.supports, v.mass.theta)   # SUPPORTS 0.53 0.29

# Multi-hop MCTS with retraction-aware chain mass.
v = store.connect("toyota", "catl")

# One tree walk resolves connectivity to many targets.
vs = store.any_of("toyota", {"catl", "lg", "samsung", "sk_hynix"})
```

Swap the root URI for `s3://bucket/prefix` and the same code runs against S3:

```python
pip install 'cassetteql[s3]'
store = InfonStore("s3://acme/chips", schema_path="schema.json")
```

## What makes it different

| | |
|---|---|
| **Cassette substrate** | Immutable content-addressed `.inf` files; split Parquet indexes per cassette; append-only manifest chain. Delta ingest never rewrites; time-travel snapshots cost one JSON read. |
| **Calibrated verdicts** | Every answer carries `(supports, refutes, theta)`. On claims the corpus can't answer, θ → 1.0 and no range-gets are issued — the pruner short-circuits. |
| **Sheaf GNN prior** | 140k-param encoder with per-relation-kind restriction maps, trained once on synthetic hypergraphs (no human labels). 99% on held-out, +94% over symbolic on reportive-edge anomalies. |
| **Schema migration** | `SchemaFunctor(rename, merge, delete)` rewrites cassettes under a new ontology via Kan pushforward. 60× faster than reingestion; old cassettes stay. |
| **Strands Analyst** | Nine tools exposed to any Strands agent: schema / ingest / report / ask / connect / any_of / findings. System prompt enforces source citation and honest NEI. |

## Optional extras

```bash
pip install 'cassetteql[s3]'       # S3 / GCS / Azure via fsspec
pip install 'cassetteql[agent]'    # Strands Analyst
pip install 'cassetteql[aws]'      # Lambda container deploy + S3
pip install 'cassetteql[all]'      # everything optional
```

## Measured

Each row below is a reproducible probe — a standalone Python script that writes a temp store, runs the scenario, and asserts the result. Probes ship inside the source distribution.

| | Symbolic only | With sheaf GNN |
|---|---|---|
| 10-claim actor-to-actor eval | 40% | **100%** |
| 2000-sample synthgen held-out | 88.5% | **99.2%** |
| Reportive-edge anomaly accuracy | 6% | **100%** |
| Range-gets per MCTS query at 300 cassettes | 20 | **1.4** |
| Migration vs. reingest (10-infon store) | 1245 ms | **20 ms (62×)** |

## Dependencies

| Package | Purpose | Required |
|---|---|---|
| `torch` ≥ 2.0 | Reasoner + GNN + SSL losses | yes |
| `transformers` ≥ 4.40 | SPLADE tokenizer/model | yes |
| `numpy` ≥ 1.24 | Linear algebra | yes |
| `pyarrow` ≥ 15 | Cassette indexes | yes |
| `fsspec` ≥ 2024.1 | Local + cloud paths | yes |
| `s3fs` ≥ 2024.1 | S3 backend | via `[s3]` |
| `strands-agents` ≥ 1.0 | Conversational Analyst | via `[agent]` |
| `boto3` ≥ 1.28 | Lambda deploy + ECR | via `[aws]` |

17 MB SPLADE-tiny ships inside the wheel — one `pip install`, no follow-up download, no GPU.

## License

Apache-2.0. The bundled SPLADE-tiny-msmarco model is also Apache-2.0.
