Metadata-Version: 2.4
Name: the-borg-project
Version: 0.1.0
Summary: Universal model assimilation into a sparse spiking neural state built on crdt-merge.
Author-email: Ryan Gillespie <rgillespie83@icloud.com>
Maintainer-email: Ryan Gillespie <rgillespie83@icloud.com>
License: BUSL-1.1
Project-URL: Repository, https://github.com/mgillr/the-borg-project
Keywords: snn,crdt,merge,model-merge,spiking
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: crdt-merge[model]>=0.9.7
Requires-Dist: numpy>=1.26
Requires-Dist: torch>=2.2
Requires-Dist: transformers>=4.40
Requires-Dist: cryptography>=42
Provides-Extra: convert
Requires-Dist: spikingjelly>=0.0.0.0.14; extra == "convert"
Provides-Extra: rag
Requires-Dist: faiss-cpu>=1.7.4; extra == "rag"
Requires-Dist: sentence-transformers>=2.6; extra == "rag"
Requires-Dist: pypdf>=4.0; extra == "rag"
Requires-Dist: python-docx>=1.1; extra == "rag"
Provides-Extra: speech
Requires-Dist: soundfile>=0.12; extra == "speech"
Provides-Extra: app
Requires-Dist: gradio>=5.9; extra == "app"
Provides-Extra: worker
Requires-Dist: fastapi>=0.110; extra == "worker"
Requires-Dist: pydantic>=2.6; extra == "worker"
Requires-Dist: uvicorn>=0.29; extra == "worker"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: jsonschema>=4.22; extra == "dev"
Requires-Dist: hypothesis>=6.100; extra == "dev"
Provides-Extra: all
Requires-Dist: the-borg-project[app,convert,dev,rag,speech,worker]; extra == "all"

# The Borg Project

Universal model assimilation into a sparse, brain-like spiking neural
superintelligence, built on `crdt-merge`.

Any existing model (RoBERTa, DeepSeek-Coder, Phi-3, a ViT, or another SNN
variant) can be converted into a sparse spiking shard and merged into a single
growing state. Every merge is mathematically conflict-free, intelligence is
preserved, and the resulting model is event-driven, quantisable to INT8, and
runnable on commodity CPUs or phones.

## Three pillars plus the collective layer

1. **Universal absorption.** Training-free ANN-to-SNN conversion turns any
   model checkpoint into a compatible sparse contribution. No fine-tuning.
2. **Spike-preserving merge.** A sparse-delta adapter on top of
   `crdt-merge`'s OR-Set, Merkle tree, provenance log, and E4 trust lattice
   treats individual weight deltas, timing traces, and activation events as
   discrete items, so naive averaging never blurs spike timings into oblivion.
3. **Predictive refinement.** An optional variational free-energy step after
   CRDT resolve tunes thresholds and timings to minimise prediction error,
   mirroring biological active inference.

Plus the **collective gossip layer**: every clone is a full peer on a
Merkle-DAG of signed contribution envelopes. Pairwise gossip converges
the global state with no central master, no rate limits, and automatic
deduplication of redundant absorptions. See `docs/collective.md` and
`borg.collective`.

Plus the **capability registry**: every absorbed model is tagged with
its head type (`causal_lm`, `masked_lm`, `embedder`, `classifier`,
`unknown`) and persisted on the state. At query time,
`borg.decode.universal_decode` dispatches to the correct decoder path
automatically -- so the Borg produces a readable answer whether you've
absorbed a causal LM, a sentence embedder, a classifier, or any
combination. The manifest is cumulative across absorbs: the diagnostic
panel shows every model ever absorbed, not just the latest round.

## Layout

```
the-borg-project/
  docs/
    vision.md, architecture.md, plain-english.md
    roadmap.md, sprint-plan.md, waterfall.md, status-matrix.md
    glossary.md, world-first.md, brain.md, convergence-path.md
    neurocrdt-principles.md, collective.md
    specs/                          per-pillar technical specs
    adrs/                           architecture decision records
  src/borg/
    assimilation.py                 top-level assimilate-and-merge entry
    convert.py                      ANN-to-SNN conversion (feedforward + MBE)
    sparse.py                       sparse-delta envelope with timing trace
    merge.py                        merge orchestration + probe gate + cumulative manifest
    e4.py                           crdt_merge.e4 trust-lattice wiring
    fep.py                          variational free-energy refinement
    calibration.py                  post-merge per-vocab logit rescaling
    inference.py                    event-driven sparse SNN forward pass
    decode.py                       token-level + universal (capability-agnostic) decoder
    heads.py                        capability registry (causal_lm / masked_lm / embedder / classifier / unknown)
    bench.py                        fidelity benchmarks (MBE vs reference)
    rag/                            document ingest and retrieval
    speech.py                       microphone transcription adapter
    app.py                          Gradio demo with chain-of-thought prompt
    worker.py                       FastAPI worker + Ed25519 envelopes
    collective/                     P2P gossip layer (CID, Bloom, Peer, sync)
  proto/                            JSON schemas for contribution envelopes
  examples/end-to-end/              local simulation scripts + HF Space config
  examples/colab/                   Colab notebooks for Nord + continuous absorb
  scripts/                          hooks, audit, benchmarks, Space wipe
  tests/                            unit + regression tests
  .github/workflows/                CI (lint, type check, tests, audit, benches)
```

## Status

Private, pre-publication. The scaffold is real code backed by tests. Every
closed gap is documented in `docs/status-matrix.md`. See `TASKS.md` for the
phased plan, `docs/roadmap.md` for the delivery calendar, and
`docs/sprint-plan.md` for the two-week sprint breakdown.

CI runs ruff, mypy, and pytest on Python 3.10, 3.11, and 3.12, plus a
dedicated forensic audit and JSON-schema validation.

## Install

```bash
pip install the-borg-project
# or with optional extras
pip install "the-borg-project[rag,speech,app,convert,worker]"
```

## Quick start (from source for development)

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
bash scripts/install-hooks.sh
bash scripts/run-tests.sh
```

## Run the demo

```bash
python -m borg.app
```

Opens a local Gradio UI with chat, document upload (RAG), and microphone
input. The chat dispatches through `borg.decode.universal_decode`, which
routes per registered head type (causal LM → token decode; masked LM →
mask-fill; classifier → class activations; embedder → RAG-grounded
answer; unknown → three-step CoT report with a latent signature). The
diagnostic panel at the top of the UI shows every absorbed model with
its head type, layer count, parameter count, and per-model sparsity.

## Assimilate a model

```python
from borg.assimilation import assimilate

assimilate(
    model_ids=["FacebookAI/roberta-base"],
    borg_path="borg_snn.pt",
)
```

Every call merges the new shard into the state at `borg_snn.pt` without
overwriting prior knowledge. The LM head and tokenizer id from the first
source model are persisted alongside the merged weights so the UI can
decode text immediately after a single assimilation round. The manifest
at `borg_snn.pt.manifest.json` accumulates every absorbed model
(dedup'd by `model_id`), and each model's detected head type is
registered under `state_dict[HEADS_KEY]` so the runtime can dispatch
per-model capabilities.

## Absorb a large SNN on Colab

For checkpoints too big for a free GitHub Actions runner (e.g. Nord at
13 GB) absorb on Colab instead: it has 100+ GB disk and a persistent
session.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mgillr/the-borg-project/blob/main/examples/colab/absorb_nord.ipynb)

The notebook downloads the source repo, mmaps each safetensors shard via
`safe_open`, drops oversized cross-architecture tensors (LM head,
embeddings) automatically, runs `assimilate_state_dict` per shard, and
uploads the resulting `borg_snn.pt` straight to your HF Space.

## Join the collective

Every clone is a peer. The collective layer (`borg.collective`) turns
local absorptions into signed envelopes addressed by content hash, and
synchronises them pairwise with known peers. The HuggingFace Space is
a seed peer in the default list -- not a central master.

```python
from borg.collective import Peer, collective_assimilate

peer = Peer(peer_id="my-node")
result = collective_assimilate(
    state_dict=my_state,
    model_id="org/model",
    local_peer=peer,
)
# result.cids_broadcast holds the Merkle-DAG ids you just published
```

If `model_id` has already been absorbed by any peer in the local
roster, `collective_assimilate` short-circuits -- one million clones
each running `collective_assimilate("gpt2")` produces one logical
absorption network-wide. See `docs/collective.md`.

## Run the worker

```bash
pip install -e .[worker]
uvicorn borg.worker:create_app --factory --host 0.0.0.0 --port 8000
```

The worker exposes `/contribute` (accepts Ed25519-signed `SparseDelta`
envelopes), `/gossip` (exchanges digests with peer workers), and
`/health`. Envelope schema is in `proto/contribution.schema.json`.
Running the worker turns your clone into a reachable peer; the
`borg.collective.http_sync` client-side gossip then converges against
you alongside every other peer on the network.

## Benchmark the MBE conversion

```bash
python scripts/bench_mbe.py --dim 32 --samples 32
```

Reports Spearman, Pearson, and cosine correlation between a reference
transformer and its MBE-converted SNN on random inputs. See
`src/borg/bench.py` for the harness.

## Upstream

`crdt-merge` provides the OR-Set, Merkle-provenance, and E4 trust primitives
this repository depends on. It is licensed BUSL-1.1 (change date 2028-03-29,
change license Apache-2.0) with patents held separately (UK Application
Nos. 2607132.4, GB2608127.3). See `docs/adrs/0006-license-considerations.md`
for the implications.

## License

the-borg-project is licensed under the **Business Source License 1.1**,
mirroring the upstream crdt-merge model. The BUSL Additional Use Grant
permits all embedded use (libraries, SaaS, research, internal tooling,
commercial products) and blocks only the resale of the-borg-project
itself as a competing merge / assimilation / collective-gossip service.

On **2028-03-29** the licence automatically converts to
**Apache License, Version 2.0**.

See `LICENSE` for the full terms, `NOTICE` for third-party attributions,
and `docs/adrs/0006-license-considerations.md` for the rationale.

For commercial licensing of out-of-scope uses:
`rgillespie83@icloud.com` / `data@optitransfer.ch`.

## Contributing

See `CONTRIBUTING.md`. The branch protocol, identity lock, and pre-push
forensic audit in `scripts/pre-push-audit.sh` are mandatory for every
contribution.
