Metadata-Version: 2.4
Name: demiourgos
Version: 0.1.0
Summary: Graph-only Mini-CPG scanner and query server for codebases
Author: Demiourgos
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: PyYAML>=6.0
Requires-Dist: redis>=5.0
Requires-Dist: fastapi>=0.115
Requires-Dist: uvicorn>=0.30
Requires-Dist: SQLAlchemy>=2.0
Requires-Dist: psycopg[binary]>=3.2
Requires-Dist: PyJWT>=2.9
Requires-Dist: tree-sitter>=0.22
Requires-Dist: tree-sitter-python>=0.23
Requires-Dist: tree-sitter-javascript>=0.23
Requires-Dist: tree-sitter-typescript>=0.23
Requires-Dist: watchfiles>=0.24
Requires-Dist: watchdog>=4.0
Requires-Dist: python-igraph>=0.11
Requires-Dist: mcp>=1.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: httpx>=0.27; extra == "dev"

# Demiourgos Backend

**The Structural Memory Engine for AI Coding Agents**

Demiourgos is a deterministic code analysis engine that builds a real-time graph of your entire codebase — every function, class, API route, and database model — and tracks exactly how they connect. When any file changes, it instantly computes the structural blast radius: which functions will break, which API routes are affected, and which execution flows are disrupted.

No LLM is used for code parsing. The analysis is mathematically exact.

> [!NOTE]
> **Living Document:** This README outlines our current architectural logic and intended trajectory. These are not hard-and-fast rules—we are actively building this engine and will iterate, refine, and change structural approaches wherever we see positive results during development.

---

## Table of Contents

1. [The Problem We Are Solving](#1-the-problem-we-are-solving)
2. [The Solution: Our Own Agent with Hive](#2-the-solution-our-own-agent-with-hive)
3. [The Universal Graph Architecture](#3-the-universal-graph-architecture)
4. [The Processing Pipeline](#4-the-processing-pipeline)
5. [Impact Scoring System](#5-impact-scoring-system)
6. [How Cohesion Scores Contribute to Impact](#6-how-cohesion-scores-contribute-to-impact)
7. [Context Slicing: When Pruning Works and When It Does Not](#7-context-slicing-when-pruning-works-and-when-it-does-not)
8. [Route Dependency Tracking for API Testing](#8-route-dependency-tracking-for-api-testing)
9. [The Worker-Judge Loop](#9-the-worker-judge-loop)
10. [Coding Style Preservation](#10-coding-style-preservation)
11. [Install and Run](#11-install-and-run)
12. [Configuration](#12-configuration)

---

## 1. The Problem We Are Solving

AI coding agents (Claude, Cursor, Windsurf, Codex) are excellent at writing code. But they are **blind to architecture**. They do not understand, structurally, how the functions in your codebase depend on each other, which database models feed which API routes, or which API must be called before another one works.

We are building an agent that fixes this. Demiourgos gives our agent (Hive) a structural memory layer that no other AI coding agent has. Below are the exact problems we are solving:

This causes four critical failures in every existing AI agent today:

### 1.1 The Blast Radius Problem

This goes far beyond "IDE tells you a function has errors." An IDE can catch a syntax error in the same file. But it cannot tell you that renaming the `user_id` column in your `User` database model will break:

- The `get_user()` function that queries `User.user_id`
- The `create_order()` function that takes a `user_id` parameter from `get_user()`'s return value
- The `media_upload()` function that stores `user_id` as a foreign key in the `Media` table
- The `GET /users/{id}` API route that calls `get_user()` and returns the `user_id` field
- The `POST /orders` API route that calls `create_order()` which internally joins on `user_id`

That is 5 breakages across 4 files, 2 database models, and 3 API routes — from a single column rename. The IDE sees none of this. The AI sees none of this. Only the graph sees it, because it tracks the edge from `DataModel(User)` → `READS_FROM` → `Function(get_user)` → `CALLS` → `Function(create_order)` → `READS_FROM` → `DataModel(Media)` → all the way up to `Route(POST /orders)` via `HANDLED_BY`.

### 1.2 The API Interdependency Problem

AI agents are notoriously bad at calling APIs correctly and writing integration tests. The reason is that APIs have hidden interdependencies — a chain of calls where one must succeed before the next one works.

Consider a real-world flow:

```
Step 1: POST /auth/login     → returns access_token
Step 2: GET  /users/me        → requires access_token in header → returns user_id
Step 3: POST /media/upload    → requires access_token AND user_id in body → returns media_id
Step 4: POST /orders          → requires access_token AND user_id AND media_id → creates order
```

An AI agent that tries to test `POST /orders` directly will get a `401 Unauthorized` because it does not know that `/auth/login` must be called first. Even if it remembers the auth step, it does not know that `user_id` comes from `GET /users/me` and `media_id` comes from `POST /media/upload`. It will invent fake values and get a `422 Unprocessable Entity`.

Demiourgos solves this because the graph stores each Route node and traces the data flow between them. It knows:
- `POST /orders` handler calls `validate_order(user_id, media_id)`
- `user_id` is tainted with origin `GET /users/me → response.id`
- `media_id` is tainted with origin `POST /media/upload → response.media_id`
- Both require `access_token` from `POST /auth/login → response.token`

The AI agent can now query the graph and get the exact API call chain with the correct parameter names and sources.

### 1.3 The Context Window Waste Problem

When an AI agent needs context about a function, it pulls in the entire function body and all related functions. But in a real codebase, functions are not short. A single controller function might be 300 lines long. The AI pulls in the full 300 lines even when only 12 lines (the specific branch that handles the user lookup) are relevant to the change.

Worse, related functions add up. If a function calls 8 other functions, each averaging 150 lines, the AI typically pulls in 1500 lines of context. But **not all of those functions actually matter to the change**.

Demiourgos serves a context slice that strips out the noise. First, it uses the graph and taint tracking to identify **only the specific downstream and upstream functions** that are actually affected by the change — ignoring the rest entirely. Then, even within those affected functions, it prunes irrelevant branches and error handlers with `[lines X-Y pruned]` markers. It delivers 47 highly relevant lines instead of 1500 blind ones.

### 1.4 The Runtime Logic Break Problem

Structural breaks (argument count mismatch) are easy to detect. But what about logic breaks that only appear at runtime?

Example:
```python
# Before: returns a list
def get_users():
    return db.query(User).all()   # returns [User, User, User]

# After: returns a generator (same signature, same type hint)
def get_users():
    yield from db.query(User).all()   # returns generator object
```

The function signature did not change. No parameter was added or removed. But every caller doing `len(get_users())` will now crash because generators do not support `len()`. This is a **logic break** that traditional static analysis misses.

Demiourgos catches this through taint tracking. It traces the return value of `get_users()` into every downstream variable that uses it. When the body changes (Soft Impact 0.1), the AI is alerted to review the tainted chain, and the context slicer serves exactly the affected lines.

### 1.5 The Missing Layers Problem

Even if you solve blast radius and API testing at the code level, there is still a gap: **business intent**. Why does this code exist? What requirement does it fulfill? What architectural decision constrains how it can be changed?

Demiourgos addresses this through a 4-layer architecture where each layer is connected to the others via a graph-of-graphs:

| Layer | Name | What It Tracks | Example |
|-------|------|---------------|---------|
| **Layer 1** | Business Layer (PRDG) | PRD sections, user stories, requirements | "Users must be able to securely process checkouts" |
| **Layer 2** | Capability Layer (FDG) | Features, feature dependencies | "Checkout Feature depends on Stripe API" |
| **Layer 3** | Reasoning Layer (ADR/FKG) | Architecture Decision Records, constraints | "ADR-012: Payment processing must run synchronously to prevent split-brain UI states" |
| **Layer 4** | Structural Layer (CPG/RDG/MDG) | Code graph, route graph, database models | Functions, API routes, database tables, and their edges |

#### Top-Down Execution: The Architect Workflow

A critical distinction in Demiourgos is **who writes which layer**:
- **Layer 4 (Structural)** is generated natively by deterministic parsers (like Tree-sitter). The user never writes it directly, and an AI agent can never hallucinate it. The code *is* the truth.
- **Layers 1, 2, and 3** are managed by a specialized **Architect Agent** collaborating directly with the User, controlled and continuously improved by the orchestration of Hive.

When a user demands a new feature, they do not just dump a massive prompt into a coding worker. The top-down flow works like this:
1. **User Story:** The User tells the Architect Agent: *"I need a synchronous checkout feature."*
2. **Evaluation:** The Architect evaluates the request against current implementations (Layer 4) and existing constraints (Layer 3). It points out unbreakable breakpoints: *"We currently process payments asynchronously. Making this synchronous will require rewriting the Stripe webhook handler without breaking the existing refund flow."*
3. **PRD & Plan:** The User and Architect discuss the trade-offs. Once agreed, the Architect writes the **PRD (Layer 1)**, updates the **Architecture constraints (Layer 3)**, and finalizes an execution plan.
4. **Execution:** The Architect dispatches Worker Agents. The Workers use the finalized constraints and the exact code map to implement the feature perfectly safely.
5. **Staging & Approval:** The Workers finish, run tests, and generate a final report with a deployed staging site. The User reviews the staging site, approves it, and the code goes to Production.

```mermaid
sequenceDiagram
    actor User
    participant Arch as "Architect Agent"
    participant L3 as "Layer 3 (Constraints)"
    participant L4 as "Layer 4 (Code Truth)"
    participant Worker as "Worker Agents"
    participant Stage as "Staging/Prod"

    User->>Arch: "I need synchronous checkouts" (User Story)
    Arch->>L4: Analyzes current code graph
    Arch->>L3: Checks existing architecture rules
    L4-->>Arch: Finds async Stripe webhook dependencies
    Arch->>User: "This breaks the existing webhook logic. Propose rewriting it?"
    User->>Arch: "Yes, get the PRD right and let's plan it."
    Arch->>L3: Finalizes PRD and ADRs
    Arch->>Worker: Dispatches implementation tasks
    Worker->>L4: Safely writes code using L4 context
    Worker->>Stage: Deploys to Staging Site
    Stage-->>User: Provides Staging URL & Impact Report
    User->>Stage: Approves for Production
```

#### Bottom-Up Impact & Stale Nodes

The graph works bottom-up as well: `Code (L4) → Features (L2) → Business Requirements (L1)`. 

However, **this bottom-up tracing does not run blind on every small keystroke**. If an agent or a developer makes a crazy amount of tiny edits, tracing the impact all the way up to "Business Intent" on every save would overwhelm the user and burn massive compute budgets.

But leaving those upper layers un-updated makes the graph stale. Demiourgos solves this through **Impact-Based Model Routing**:
- **Real-time (L4 only):** As code is edited, only local structural impact (L4) is computed by the deterministic parsers. This is instant and costs zero tokens.
- **Micro-Syncs (Small Models):** If the structural impact is small (e.g., adding a simple `created_at` field), Hive dispatches a fast, cheap model (like Mistral or Claude Haiku) to quickly read the L4 diff and quietly update the associated Feature nodes (L2) to prevent them from going stale.
- **Deep Checkpoints (Large Models):** When a Worker finishes a major task, or a Pull Request introduces large cross-module changes, Hive dispatches a heavy reasoning model (GPT-4o or Claude 3.5 Sonnet) to do a deep bottom-up trace. It maps the cumulative changes upward: *"Your changes to `auth.py` successfully implemented the Sync Checkout feature, but accidentally impacted the Guest Refund user story."* 

**Finding the Sweet Spot:**
We are actively building the mathematics to define the exact thresholds for these checkpoints. Do we trigger a Micro-Sync on every "File Save"? Do we run a Deep Checkpoint on every "Git Push", or only when a PR is "Approved"? Finding the exact sweet spot between graph freshness, user interruption, and compute cost is the ultimate goal of Hive's orchestration layer — and solving this balance is what changes everything.

**Demiourgos solves all of these problems** by building a deterministic, multi-layer graph that tracks every connection from business requirement to database column, scoring every change, and powering our own autonomous agent system — Hive.

---

## 2. The Solution: Our Own Agent with Hive

Demiourgos is not just an analysis engine. It is the structural backbone of **Hive** — our own autonomous AI coding agent.

Hive is an orchestration layer that coordinates multiple AI models, tools, and memory systems to perform complex coding tasks safely. Demiourgos is the "nervous system" that gives Hive structural awareness.

### 2.1 The Four Components

```mermaid
graph TB
    subgraph Hive ["Hive — The Orchestrator"]
        direction TB
        ORCH["Control Loop<br/>Task planning, routing, budget"]
    end

    subgraph Brain ["The Brain — Reasoning"]
        LLM["LLM (Claude / GPT-4o)"]
        PLAN["Task decomposition"]
        CODE["Code generation"]
        REASON["Impact reasoning"]
    end

    subgraph Hands ["The Hands — Action"]
        FS["File system I/O"]
        AST["Tree-sitter parsing"]
        GRAPH["FalkorDB writes"]
        SHELL["Shell commands"]
        API["External API calls"]
    end

    subgraph Memory ["The Memory — Alloy Net"]
        L1["Layer 1: PRDG (Business)"]
        L2["Layer 2: FDG (Capability)"]
        L3["Layer 3: ADR/FKG (Reasoning)"]
        L4["Layer 4: CPG/RDG/MDG (Structural Code)"]
    end

    ORCH --> Brain
    ORCH --> Hands
    ORCH --> Memory
    Brain --> Hands
    Memory --> Brain

    style Hive fill:#1a1a1a,stroke:#fff,color:#fff
    style Brain fill:#1a1a1a,stroke:#e040fb,color:#e040fb
    style Hands fill:#1a1a1a,stroke:#00bcd4,color:#00bcd4
    style Memory fill:#1a1a1a,stroke:#ffc107,color:#ffc107
```

**Hive (The Orchestrator):** The nervous system. It receives a task from the user, decomposes it into subtasks, decides which AI model to use, manages the token budget, and routes work between the Brain, Hands, and Memory.

**The Brain (Reasoning):** The LLM (Claude, GPT-4o, Mistral). It plans tasks, writes code, reasons about impact, and generates human-readable summaries. The Brain never touches files directly — it always works through the Hands.

**The Hands (Action):** The tool layer. File system operations, Tree-sitter AST parsing, FalkorDB graph writes, shell commands, and external API calls. Every action is logged and reversible.

**The Memory (Alloy Net):** The fundamental structural graph of the system, acting as long-term context storage across four interconnected layers:

| Layer | Name | Purpose | Example |
|-------|------|---------|--------|
| **Layer 1** | PRDG (Business) | Requirements, PRDs, and User Stories | "User can upload photo" |
| **Layer 2** | FDG (Capability) | Features and Product rollouts | "Photo Upload Feature" |
| **Layer 3** | ADR (Reasoning) | Architectural Decisions & Constraints | "Must process uploads asynchronously" |
| **Layer 4** | Structural (Code) | Deterministic code graph (CPG/RDG/MDG) | `routes.py` calls `upload()` |

*(In addition to Alloy Net's structural truth, Hive also maintains a secondary **Semantic Memory** for team preferences like "always use camelCase", and an **Ephemeral Memory** for the active task's scratchpad).*

### 2.2 How Hive Handles Multiple Agents

Hive can run multiple Brain instances in parallel for large tasks:

1. **Task Decomposition:** Hive breaks a large task (e.g., "Refactor the authentication system") into independent subtasks ("Update password hashing," "Migrate session tokens," "Update API routes").

2. **Conflict Detection via Graph:** Before dispatching subtasks to parallel agents, Hive queries the Demiourgos graph to check if any subtasks share dependencies. If Agent A is editing `validate_user()` and Agent B is editing `create_session()` which calls `validate_user()`, Hive detects the conflict and serializes those two tasks.

3. **Graph Lock Regions:** Each agent "locks" the subgraph it is working on. Other agents can read it, but cannot write to overlapping nodes. This prevents merge conflicts at the architectural level.

4. **Result Merging:** After parallel agents finish, Hive runs a graph-level merge check. It re-scans all modified files, recomputes impact scores, and verifies that no cross-agent Hard Impacts (1.0) were introduced.

### 2.3 How Hive Keeps Improving

Every task that Hive completes generates a feedback signal:

- **Judge Pass Rate:** How often does the Worker's output pass the Judge on the first try? A low pass rate means the Brain needs better context or the task decomposition is too coarse.
- **Impact Prediction Accuracy:** After a change is deployed, did the predicted impact (1.0 / 0.5 / 0.1) match reality? False positives mean our diffing is too aggressive. False negatives mean we missed a dependency.
- **Context Slice Effectiveness:** Did the AI agent use all the lines in the context slice, or did it ignore half of them? Ignored lines mean our pruning is too generous.

These signals are stored in Semantic Memory and used to tune future task planning, context depth, and Judge strictness.

---

## 3. The Universal Graph Architecture

Instead of maintaining separate disconnected graphs that drift out of sync, Demiourgos merges three structural dimensions into **one unified Property Graph** stored in FalkorDB (a Redis-native graph database). 

*(Note: We conceptually group these nodes into three "Dimensions" — CPG, RDG, MDG — for easier understanding, but physically they all live together in the exact same database. There are no hard boundaries between them.)*

```mermaid
graph TB
    subgraph DimA ["Dimension A: Code Property Graph (CPG)"]
        M["Module: routes.py"]
        F1["Function: get_user"]
        F2["Function: create_user"]
        C1["Class: UserService"]
        C2["Class: BaseService"]
        M -->|CONTAINS| F1
        M -->|CONTAINS| F2
        M -->|CONTAINS| C1
        F1 -->|CALLS| F2
        C1 -->|EXTENDS| C2
    end

    subgraph DimB ["Dimension B: Route Dependency Graph (RDG)"]
        R1["Route: GET /users/id"]
        R2["Route: POST /users"]
    end

    subgraph DimC ["Dimension C: Model Dependency Graph (MDG)"]
        DM1["DataModel: User Table"]
    end

    R1 -->|HANDLED_BY| F1
    R2 -->|HANDLED_BY| F2
    F1 -->|READS_FROM| DM1
    F2 -->|WRITES_TO| DM1

    style R1 fill:#4caf50,color:#000
    style R2 fill:#4caf50,color:#000
    style DM1 fill:#ffc107,color:#000
    style M fill:#6366f1,color:#fff
    style F1 fill:#e040fb,color:#000
    style F2 fill:#e040fb,color:#000
    style C1 fill:#00bcd4,color:#000
    style C2 fill:#00bcd4,color:#000
```

### Dimension A — CPG (Code Property Graph)

The foundation. Tree-sitter parses the raw source code and extracts:
- **Module nodes** — one per source file
- **Function nodes** — every function and method, including overloads
- **Class nodes** — every class with inheritance chains
- **Symbol nodes** — variables and constants
- **CALLS edges** — which function calls which function
- **CONTAINS edges** — which module owns which function
- **IMPORTS edges** — which module depends on which module
- **EXTENDS edges** — which class inherits from which class

This layer is 100% deterministic. No LLM is involved. Tree-sitter is a C-based parser that produces a mathematically exact Abstract Syntax Tree (AST).

### Dimension B — RDG (Route Dependency Graph)

Framework plugins (like the FastAPI extractor) scan the AST for HTTP route decorators:

```python
@app.get("/users/{user_id}")
def get_user(user_id: str):
    return db.query(User).filter(User.id == user_id).first()
```

The plugin creates a `Route` node (`GET /users/{user_id}`) and connects it to `get_user()` via a `HANDLED_BY` edge.

**Why this matters:** The AI agent can now trace from any internal function up to the API route that exposes it. If a function breaks, the AI knows exactly which HTTP endpoint is affected.

### Dimension C — MDG (Model Dependency Graph)

ORM plugins (like the SQLAlchemy extractor) scan for database model definitions and data access patterns:

```python
class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    email = Column(String)

def get_user(user_id):
    return db.query(User).filter(User.id == user_id).first()  # READS_FROM User
```

The plugin creates a `DataModel` node (`User` table) and connects `get_user` to it via a `READS_FROM` edge.

**Why this matters:** If a database column is renamed or deleted, the AI can trace the graph from the DataModel through all functions that read/write it, and up to the API routes that depend on those functions.

### The Power of One Unified Graph

Because all three dimensions live in a single FalkorDB instance, a single Cypher query can cross all dimensions:

```cypher
-- "If I rename the 'email' column, which API routes break?"
MATCH (dm:DataModel {name: "User"})<-[:READS_FROM|WRITES_TO]-(f:Function)
MATCH (f)<-[:CALLS*0..5]-(upstream:Function)
MATCH (r:Route)-[:HANDLED_BY]->(upstream)
RETURN r.name AS affected_route, f.name AS via_function
```

This returns the exact list of API routes that will break. No separate tool, no manual tracing, no guesswork.

---

## 4. The Processing Pipeline

When a developer (or an AI agent) saves a file, Demiourgos runs the following pipeline:

```mermaid
flowchart LR
    A["File Save<br/>Detected"] --> B["AST Parse<br/>(Tree-sitter)"]
    B --> C["Plugin Pass<br/>(FastAPI, SQLAlchemy, Taint)"]
    C --> D["Graph Diff<br/>(FalkorDB Update)"]
    D --> E["Impact Score<br/>(1.0 / 0.5 / 0.1)"]
    E --> F["Context Slice<br/>(Taint-Pruned Output)"]

    style A fill:#6366f1,color:#fff
    style B fill:#4caf50,color:#000
    style C fill:#00bcd4,color:#000
    style D fill:#e040fb,color:#000
    style E fill:#ffc107,color:#000
    style F fill:#ef5350,color:#fff
```

### Step 1: File Save Detected

The OS-native file watcher (`watchdog`, FSEvents on macOS, inotify on Linux) detects the file change instantly. No polling. Zero CPU overhead when nothing changes.

**File:** `watcher.py`

### Step 2: AST Parse (Tree-sitter)

The changed file is parsed by Tree-sitter. The result is a `ParsedModule` containing all definitions (functions, classes), calls, imports, and identifiers found in that file.

Each file is hashed with SHA-256. If the hash matches the previous scan, the file is skipped entirely. This is the CDC (Change Data Capture) efficiency — only changed files are reprocessed.

**File:** `parser_adapters.py`

**Supported Languages:** Python, TypeScript, JavaScript

### Step 3: Plugin Pass (The Translation Layer)

After the universal AST parse, a set of framework-specific extractors are run. **This is the translation layer.**

Codebases are messy because every framework (FastAPI, Django, Express, Spring) uses different syntax for the exact same architectural concepts. The job of the plugins is to translate framework-specific syntax into a **universal, framework-agnostic graph structure** (e.g., `Route` → `HANDLED_BY` → `Function` → `READS_FROM` → `DataModel`).

Because the core Demiourgos engine (Diffing, Taint Tracking, Impact Scoring) only consumes this universal structure, the engine itself never needs to know what framework your code is written in. 

| Plugin | What It Detects (Framework Specific) | Translated To (Universal Structure) |
|--------|--------------------------------------|-------------------------------------|
| **FastAPI Extractor** | `@app.get()`, `@router.post()` decorators | Creates `Route` nodes and `HANDLED_BY` edges |
| **SQLAlchemy Extractor** | ORM model classes, `db.query()` calls | Creates `DataModel` nodes, `READS` / `WRITES` edges |
| **Taint Tracker** | Variable assignments and data flow | Attaches `origins` data to arguments on `CALLS` edges |

This guarantees extreme extensibility. Without touching the internals of how Demiourgos scores impact or traces data flow, you can add support for a completely new framework (like Django, Express, or Prisma). You simply drop a new `extractor.py` file into the `plugins/` directory that translates the framework's AST syntax into the universal nodes and edges. Zero internal wiring required.

**Directory:** `plugins/`

### Step 4: Graph Diff (FalkorDB Update)

The old subgraph for the changed file is deleted from FalkorDB. The new subgraph is written atomically. Only the changed file is touched. Hundreds of unchanged files are never reprocessed.

**File:** `graph_store.py`

### Step 5: Impact Score

The diffing engine compares the old and new versions of each function and computes a structural impact score:

**File:** `diffing.py`

(See Section 5 for the full scoring system.)

### Step 6: Context Slice

The context slicer packages up only the affected downstream code and feeds it to the AI agent. It uses taint analysis to include only the lines that actually matter.

**File:** `context_slicer.py`

(See Section 7 for the full slicing system.)

---

## 5. Impact Scoring System

Every time a function changes, Demiourgos scores the impact on every caller. The score tells the AI (or developer) exactly how dangerous the change is.

### Score Tiers

| Score | Level | Trigger | What It Means |
|-------|-------|---------|---------------|
| **1.0** | Hard Impact | Argument count changed, parameter deleted, return type changed | Callers **will** crash at runtime. The function signature contract is broken. |
| **0.5** | Medium Impact | New optional parameter added, type annotation changed | Callers **might** need updating. The contract shifted but is not broken. |
| **0.1** | Soft Impact | Logic body changed, internal variable renamed | Callers are structurally unaffected. The output behavior may have changed. |

### How Scoring Works

The diffing engine (`diffing.py`) compares the old and new function structurally:

1. **Argument count comparison:** If the old function had 3 parameters and the new one has 4 (without a default value), that is a Hard Impact (1.0). Every caller passing 3 arguments will now fail.
2. **Type annotation diff:** If a parameter type changed from `str` to `int`, that is a Medium Impact (0.5). The function still accepts the same number of arguments, but the type contract changed.
3. **Body hash comparison:** If only the internal logic changed (the function still has the same signature), that is a Soft Impact (0.1).

### Taint-Based Data Flow Tracking

Beyond structural diffing, the Taint Tracker traces the actual data flowing between functions.

Example:
```python
def caller():
    x = db.get_user()        # x is tainted with origin "db.get_user()"
    y = x                    # y inherits the taint
    process(y)               # CALLS edge records: argument[0].origins = ["db.get_user()"]
```

This means if `db.get_user()` changes its return type, Demiourgos knows that `process()` is affected — not just because of a generic CALLS edge, but because it traces the exact variable (`y`) that carries the tainted data.

The taint origins are stored directly on the CALLS edge as JSON:
```json
{
  "arguments": [
    { "value": "y", "position": 0, "origins": ["db.get_user()"] }
  ]
}
```

## 6. How Cohesion Scores Contribute to Impact

When Leiden community clustering (Phase 4) is active, the cohesion score of each cluster adds a dimension to impact analysis that raw scoring alone cannot provide.

### What Cohesion Measures

Cohesion is the ratio of actual internal edges to the maximum possible internal edges within a community:

```
cohesion = (actual internal edges) / (max possible internal edges)
```

- **1.0** — Every function in the cluster calls every other function. Very tightly coupled.
- **0.7+** — Strong functional grouping. A change inside this cluster is likely to stay inside it.
- **0.3-0.7** — Moderate grouping. Some functions are loosely connected.
- **Below 0.3** — Weak grouping. The cluster may be an artifact of the algorithm rather than a true functional unit.

### How Cohesion Changes the Impact Story

| Scenario | Without Cohesion | With Cohesion |
|----------|-----------------|---------------|
| Change in a high-cohesion (0.87) cluster | "14 functions impacted" | "14 functions impacted, BUT all are in the Auth cluster (cohesion 0.87). The blast radius is self-contained. Low cross-system risk." |
| Change in a low-cohesion (0.3) cluster | "14 functions impacted" | "14 functions impacted across a LOOSELY GROUPED cluster (cohesion 0.3). These functions may not actually be related. Higher investigation risk." |
| Change that crosses 3 clusters | "47 functions impacted" | "47 functions across 3 clusters: Auth (0.87), Payment (0.72), Logging (0.41). The change breaks cluster boundaries. HIGH cross-system risk." |

### Cohesion as a Priority Signal

The AI agent uses cohesion to prioritize its repair work:

1. **High-cohesion cluster with impact:** Fix it first. The cluster is tightly coupled, so fixing one function likely fixes the whole cluster.
2. **Low-cohesion cluster with impact:** Investigate carefully. The functions may need individual fixes.
3. **Cross-cluster impact:** This is the highest risk. A change rippling across cluster boundaries often means a fundamental architectural decision has changed.

---

## 7. Context Slicing: When Pruning Works and When It Does Not

Context slicing is the process of extracting only the relevant lines of code for the AI agent. It works by tracing the graph from a target function, collecting all related functions, and pruning non-relevant lines from each source file.

### How Pruning Works

1. **Graph traversal:** Starting from the target function, the slicer walks the CALLS graph outward up to a configurable hop depth.
2. **AST Extraction:** For each related function discovered, it pulls the complete Abstract Syntax Tree (AST) for that function.

#### The Density & Pruning Algorithm

Once the slicer isolates a function, it doesn't just dump the whole body into the context. It runs a **density-based pruning algorithm**:

1. **Mark Anchor Nodes:** The slicer traverses the function's internal AST and flags "anchor nodes". An anchor is any line that:
   - Contains a call to another function in the traversal graph.
   - Modifies or reads a variable tainted by the diff.
   - Is part of the function signature, `return` statement, or `yield`.
2. **Expand Context Windows (The `k` Radius):** For every anchor node, the algorithm flags `k` lines above and `k` lines below it to preserve immediate local context (typically `k=2`).
3. **Merge Overlapping Windows:** If two anchor windows overlap, they are merged into one continuous block. 
4. **Calculate Function Density:** Once all blocks are merged, the slicer computes the overall **Pruning Density Score** for the function (`kept_lines / total_lines`). It then applies strict mathematical rules to decide if pruning is actually worth it:
   - **Density > 0.7 AND Function < 50 lines:** The slicer aborts pruning and serves the **full function**. Why? Because hiding 8 lines in a 40-line function saves almost no tokens, but replaces them with ugly `[lines X-Y pruned]` markers that confuse the LLM (fragmentation penalty > token savings).
   - **Density > 0.7 AND Function >= 50 lines:** The slicer **prunes**. Even high-density sections in massive 500-line functions are worth pruning to save isolated chunks of 30-40 lines of noise.
   - **Density <= 0.7:** The slicer **always prunes**. The signal-to-noise ratio is poor, and pruning will yield massive token savings.
5. **Ghost Declarations:** If a variable is needed in a kept section, but its original assignment falls into a pruned section, the slicer resurrects it as a "ghost declaration" (e.g., `user_id = ... # [pruned logic]`) so the AI understands the binding without needing the bloated code.

### When Pruning Works Well

| Situation | Why Pruning Succeeds |
|-----------|---------------------|
| **Well-structured code with small functions** | Each function is self-contained. The graph cleanly identifies which functions matter. Pruning removes everything else. |
| **Clear module boundaries** | Functions in separate files with explicit imports. The graph has clean IMPORTS edges to follow. |
| **Typed function signatures** | Type annotations give the slicer confidence about data flow. It knows exactly which variables carry tainted data. |
| **Framework-detected routes and models** | The Route and DataModel nodes give extra entry/exit points for graph traversal, so the slicer captures the full chain from API to database. |
| **Shallow call chains (2-3 hops)** | The slicer captures the full context without pulling in hundreds of irrelevant functions. |

### When Pruning Does Not Work Well

| Situation | Why Pruning Struggles | Mitigation |
|-----------|----------------------|------------|
| **Giant monolithic functions (500+ lines)** | The entire function is one range. Pruning cannot remove anything inside it because the AST treats it as a single unit. | Break large functions into smaller ones. The graph naturally benefits. |
| **Heavy use of dynamic dispatch** (`getattr`, `eval`, `exec`) | Tree-sitter cannot see the call target. The graph has no edge to follow. The slicer misses the dependency. | The slicer falls back to including the entire file when it detects dynamic dispatch patterns. |
| **String-based queries** (raw SQL, raw HTTP calls) | `db.execute("SELECT * FROM users")` bypasses the ORM. The SQLAlchemy plugin cannot detect it. No `READS_FROM` edge is created. | Use ORM methods. Future plugins may parse SQL strings. |
| **Global mutable state** | A function modifies a global dictionary. Another function reads it. There is no CALLS edge between them — they communicate through side effects. | Taint tracking partially helps by tracing variable assignments, but cross-function globals are a known limitation. |
| **Deeply nested call chains (10+ hops)** | The slicer pulls in too many functions. The context slice becomes larger than the original file. | Configure `max_hops` to limit traversal depth. Phase 4 clustering helps by summarizing distant impacts at the cluster level. |
| **Decorators and metaclasses** | Heavy decorator wrapping can hide the real function from Tree-sitter. The AST sees the wrapper, not the inner function. | The parser has specific handling for common decorators (`@app.get`, `@staticmethod`). Custom decorators may need plugin support. |

### Slice Modes

When the AI agent requests context via the MCP `demiourgos_context` tool, it can specify a slice mode:

| Mode | What Is Included | Use Case |
|------|------------------|----------|
| **full** | The complete function body and all related functions | Deep investigation of complex logic |
| **skeleton** | Only function signatures and docstrings | Quick architectural overview |
| **auto** | Adapts based on function complexity and graph confidence | Default for AI workflow |
| **custom** | User-defined line ranges | Specific debugging |

### Token Savings and LLM Attention Theory

Why go through the trouble of AST-level pruning and taint-tracking? Because LLMs suffer from "Lost in the Middle" syndrome. 

### A Real-World Pruning Example

Imagine a 2,000-line `payment_service.py` file containing 14 functions. A developer changes a database column used by the `charge_stripe()` function. The AI agent needs to fix the downstream function `process_checkout()`. 

The `process_checkout()` function is 300 lines long, but the `charge_stripe()` call only happens inside one specific `if` branch.

**Without Slicing (Current AI Agents):**
The agent consumes the entire 2,000-line `payment_service.py` file.
1. **Token Cost:** ~15,000 tokens per prompt.
2. **Attention Dilution:** Extreme. The LLM must read 13 unrelated functions and 290 unrelated lines of `process_checkout()` just to find the 10 lines that matter.
3. **Hallucination Risk:** High. The LLM might accidentally rewrite an unrelated branch simply because it was present in the context.

**With Demiourgos Density Slicing:**
1. The Slicer isolates the `process_checkout()` function (drops the other 13 functions immediately).
2. It finds the `charge_stripe()` anchor node inside the AST.
3. It keeps the function signature, the anchor line, and a 2-line radius around the anchor.
4. It prunes the remaining 285 lines of `process_checkout()`, replacing them with `[lines X-Y pruned]`.

- **Tokens consumed:** ~250 tokens **(98% savings)**.
- **Amplifies signal-to-noise:** Mathematically forces the LLM's attention mechanism to focus 100% of its weights on the tainted data flow.
- **Prevents collateral damage:** The AI cannot break unrelated logic branches because it cannot even see them.

### Expected Token Savings by Function Profile

How much context do we actually save? It depends heavily on the code profile:

| Code Profile | Slice Behavior | Token Savings | AI Performance Impact |
|--------------|----------------|---------------|-----------------------|
| **Giant Monolithic Functions** (500+ line procedural scripts) | Heavy pruning. Only specific branches containing anchors are kept. | **~85% to 95%** | Massive. Completely eliminates the "Lost in the Middle" syndrome. |
| **Object-Oriented Classes** (Classes with many small methods) | Medium pruning. Keeps the class definition and only the specific methods touched. | **~70% to 80%** | Very High. AI sees the class interface without the implementation noise of other methods. |
| **Utility Modules** (Many tiny, 10-line pure functions) | Light pruning. High density aborts pruning (fragmentation > savings). Entire functions sent. | **~40% to 60%** | Moderate. Savings come entirely from dropping the other utility functions in the file. |
| **God Files** (10,000+ line legacy files) | Extreme pruning. | **> 98%** | Critical. Makes it actually possible to use AI on legacy codebases without maxing out context limits. |

---

## 8. Route Dependency Tracking for API Testing

This section explains the specific problem of AI agents failing at API testing, and how Demiourgos solves it.

### Why AI Fails at API Testing Today

When an AI agent needs to test an API endpoint, it typically struggles in two ways:

1. **Payload Guessing (422 Errors):**
   - It reads the route handler, but the expected JSON body structure is defined in a Pydantic `UserCreate` model imported from a completely different file.
   - The AI guesses the payload and sends `{"email": "x"}` instead of the required nested structure `{"user": {"email": "x"}}`. It gets a `422 Unprocessable Entity` and burns tokens looping to fix it.

2. **Blind Dependency Breaks (500 Errors):**
   - It successfully calls the route, but gets a 500 error because a downstream dependency changed.

   - The AI cannot see that 4 hops down, `serialize_email()` crashed because a database column was silently renamed.

### How Demiourgos Fixes This

Because the Framework Extractors (like the FastAPI plugin) also map out the exact request schemas and their Pydantic/Zod dependencies, the AI has perfectly structured knowledge of **both** what goes into the route, and what happens after it. 

The AI agent can query exactly what object needs to be sent:

```cypher
"What is the required payload and full dependency chain for POST /users?"
```

Demiourgos returns:
```
Route: POST /users
  ├─ Schema (Expects):
  │    └─ UserCreate { user: dict(email: str, is_active: bool) }
  └─ Handler: create_user(payload: UserCreate)
       └─ WRITES_TO: User (columns: id, email, created_at)
       └─ CALLS: send_welcome_email(payload.user.email)  ← HARD IMPACT 1.0
            Reason: User.email column no longer exists (renamed to email_address)
```

The AI now knows:
1. **Exactly how to build the JSON request payload** without guessing.
2. The complete execution chain from route to database.
3. Precisely which function to fix and what the new column name is.

### How a User Can Steer This

The graph works in both directions:

**Top-Down (Route → Database):**
```cypher
MATCH (r:Route {name: "GET /users/{id}"})-[:HANDLED_BY]->(f:Function)
MATCH path = (f)-[:CALLS*]->(downstream:Function)
MATCH (downstream)-[:READS_FROM|WRITES_TO]->(dm:DataModel)
RETURN path, dm.name
```
"Starting from this API route, show me every database table it touches."

**Bottom-Up (Database → Route):**
```cypher
MATCH (dm:DataModel {name: "User"})<-[:READS_FROM|WRITES_TO]-(f:Function)
MATCH path = (f)<-[:CALLS*]-(upstream:Function)
MATCH (r:Route)-[:HANDLED_BY]->(upstream)
RETURN r.name, path
```
"Starting from this database table, show me every API route that depends on it."

**Lateral (Function → Function):**
```cypher
MATCH (f:Function {name: "validate_user"})<-[:CALLS]-(caller:Function)
MATCH (caller)-[:CALLS]->(sibling:Function)
RETURN sibling.name
```
"What other functions does the caller of `validate_user` also call?" — useful for understanding the broader context of a change.

**Layer 4 Deep Code Structure (AST Pattern Matching):**
Because Layer 4 breaks code down into its literal Abstract Syntax Tree components (TryCatch blocks, Variable Declarations, Return statements), a user can query structural patterns that regex could never find:

```cypher
MATCH (f:Function)-[:AST_CHILD*]->(t:TryCatchBlock)
MATCH (t)-[:AST_CHILD*]->(c:CatchClause)
WHERE NOT (c)-[:AST_CHILD*]->(:CallExpression {name: "logger.error"})
RETURN f.name
```
*"Show me all functions that have a `try/catch` block that silently swallows errors without logging them."*

This completely changes how a user steers large-scale refactors. Instead of guessing where technical debt lives, they query the exact code structure directly from the database to give the Worker Agents a precise hit-list of functions to fix.

Both routing dimensions (`CALLS`) and structural dimensions (`AST_CHILD`) work seamlessly together because all edges are stored in the same unified graph. The user does not need to "switch between views." They simply change the query.

---

## 9. The Worker-Judge Loop

The Worker-Judge Loop is the core quality-assurance mechanism of Hive. No code is committed without passing the Judge.

```mermaid
flowchart LR
    H["Hive<br/>Initiate"] -->|TASK| W["Worker<br/>Execute"]
    W -->|DRAFT| J["Judge<br/>Validate"]
    J -->|PASS| SHIP["Ship Code"]
    J -->|FAIL| W
    J -->|BUDGET HIT| ESC["Escalate to Human"]

    style H fill:#1a1a1a,stroke:#fff,color:#fff
    style W fill:#1a1a1a,stroke:#00bcd4,color:#00bcd4
    style J fill:#1a1a1a,stroke:#e040fb,color:#e040fb
    style SHIP fill:#1a1a1a,stroke:#4caf50,color:#4caf50
    style ESC fill:#1a1a1a,stroke:#ffc107,color:#ffc107
```

### Step 1: Hive Initiates

Hive receives a task ("Fix the broken login endpoint"). It queries the Demiourgos graph to gather:
- Which functions are involved in the login endpoint (via Route → HANDLED_BY → Function → CALLS chain)
- The current impact scores on those functions
- A context slice of only the relevant code

Hive packages this into a structured prompt and dispatches it to the Worker.

### Step 2: Worker Executes

The Worker is the Brain (LLM) combined with the Hands (tools). It:
1. Reads the context slice from Demiourgos
2. Generates the code fix
3. Writes the file via the Hands
4. Triggers a Demiourgos re-scan (automatic on file save)
5. Collects the new impact scores

The Worker produces a **draft** — the code change plus the updated graph state.

### Step 3: Judge Validates

The Judge is a separate LLM call (can be the same model or a different one) that evaluates the Worker's output against a checklist:

| Check | What the Judge Verifies |
|-------|------------------------|
| **Structural integrity** | Did the change introduce any new Hard Impact (1.0) edges? If yes, are they resolved? |
| **Test passage** | Did existing tests pass? Were new tests added for new functionality? |
| **Graph constraints** | Are there orphaned nodes? Did any Process break at an early step? |
| **Policy compliance** | Does the code follow the team's coding style? (See Section 10) |
| **Scope containment** | Did the change stay within the requested scope, or did the Worker edit unrelated files? |

### Humans Drive Scope, Agents Drive Code (Layer 4)

In Demiourgos, **human developers no longer write code manually**. The user exclusively manages Layer 1 (Business Stories) and Layer 2 (Features). The user talks to the **Architect Agent** to define the plan and constraints (Layer 3). Working from that plan, the **Worker Agents** are the *only* entities that modify Layer 4 (The Code Structure). The human developer acts as an executive reviewer, approving the final Pull Requests or Staging deployments.

### The Three Outcomes

**Pass → Ship:** All checks pass. The code is committed and the PR is ready for human review.

**Fail → Iterative Vectoring:** The Judge does not just say "try again." It provides exact structural feedback designed to **vector the Worker agent closer to the goal**. The Worker receives:
- The specific failing tests or graph constraints.
- The precise AST lines that caused the new failure.
- A dynamically updated context slice that now includes the downstream functions the Worker accidentally broke in its draft.

By feeding the Worker the exact blast radius of its own mistakes, Hive mathematically coerces the LLM toward a successful solution. The loop repeats until the graph is stable (max 3 retries).

**Budget Hit → Escalate:** If the Worker fails after N retries, or the token budget is exhausted, Hive escalates to a human developer. It provides:
- The original task
- Everything the Worker tried
- The specific checks that keep failing
- The relevant context slice

The human can fix the issue manually or adjust the task decomposition.

---

## 10. Coding Style Preservation

One of the hardest problems with AI-generated code is that it does not match the team's existing coding style. The AI writes valid code, but it looks "foreign" in the codebase.

Demiourgos solves this through stored style preferences in Semantic Memory.

### How Style Is Captured

1. **Automatic style detection:** During the first full scan, Demiourgos analyzes the codebase for patterns:
   - Naming conventions (snake_case, camelCase, PascalCase)
   - Import ordering (stdlib first, then third-party, then local)
   - Quote style (single vs double)
   - Docstring format (Google-style, NumPy-style, reStructuredText)
   - Error handling patterns (try/except vs early return)
   - Indentation (spaces vs tabs, 2 vs 4)

2. **Manual style overrides:** Teams can create style files that the agent reads:
   - **Claude CLAUDE.md / skills files** — Stored instructions that Claude Code reads on every session. Teams can define coding rules, naming conventions, and architecture preferences here.
   - **Cursor .cursorrules** — Similar style rules for Cursor agents.
   - **.editorconfig** — Standard editor configuration for indentation, line endings, etc.

3. **Semantic Memory storage:** All detected and manual style preferences are stored in the Semantic Memory layer of the Alloy Net. When the Brain generates code, it receives these preferences as part of its system prompt.

### How Style Is Enforced

| Stage | What Happens |
|-------|-------------|
| **Brain prompt** | Hive injects style rules into the Brain's system prompt: "Use snake_case for Python functions, type hints on all parameters" |
| **Worker output** | The Worker generates code following the injected style rules |
| **Judge validation** | The Judge checks the output against the style rules. If naming convention is violated, it fails the check. |
| **Post-commit hook** | If a linter is configured (ruff, eslint), the Hands run it automatically. Lint failures are fed back to the Worker. |

### How Style Keeps Improving

Every time a human overrides the AI's code style (renames a variable, reformats an import), on the next scan Demiourgos detects the change and updates the style profile in Semantic Memory. Over time, the style profile converges to the team's actual preferences.

Example: If the AI uses `getUserById` but the human always renames it to `get_user_by_id`, after three occurrences, the Semantic Memory records: "This team uses snake_case for function names." Future generations use snake_case.

---

## 11. Install and Run

### Requirements

- Python 3.11+ (for `pipx` / editable dev install)
- A running FalkorDB instance (Redis with the FalkorDB module)

### Install Channels

```bash
# Homebrew (recommended global install)
brew install demiourgos/tap/demiourgos
```

```bash
# Curl installer (binary install fallback)
curl -fsSL https://raw.githubusercontent.com/sarveshdakhore/demiourgos/main/demiourgos-backend/scripts/install.sh | sh
```

```bash
# pipx (Python-based global install)
pipx install demiourgos
```

```bash
# Local editable development install
cd demiourgos-backend
pip install -e .
```

### Update Checks

Every command run performs a cached update check (once per 24 hours by default) and
prints a non-blocking banner when a newer version exists. The banner includes the
correct upgrade command based on install channel (`brew`, `pipx`, `pip`, or `curl`).

```bash
# Inspect current/latest version and channel-specific upgrade command
demiourgos self update-status

# Force live check (ignore cache)
demiourgos self update-status --check-now
```

```bash
# Disable automatic update checks
export DEMIOURGOS_NO_UPDATE_CHECK=1

# Optional: point update checks to a custom release endpoint
export DEMIOURGOS_UPDATE_MANIFEST_URL=https://api.github.com/repos/sarveshdakhore/demiourgos/releases/latest
```

### Quickstart

```bash
# Start hosted control-plane backend (auth/projects/keys/trials)
demiourgos control-plane --host 0.0.0.0 --port 8000 --workspace-root .

# Initialize a new project
demiourgos init --path /path/to/your/codebase

# Scan the codebase and build the graph
demiourgos scan --config .demiourgos.json

# Start local user graph serve (data-plane, not hosted backend)
demiourgos serve --config .demiourgos.json --port 7788

# Start MCP server for AI agent tool calls (stdio)
demiourgos mcp --config .demiourgos.json

# Optional: enable MCP error reporting to hosted backend
export DEMIOURGOS_CONTROL_PLANE_URL=http://127.0.0.1:8000
export DEMIOURGOS_PROJECT_ID=<project_id>
export DEMIOURGOS_PROJECT_API_KEY=<dpk_key>

# Report circular dependency cycles
demiourgos report cycles --config .demiourgos.json

# Start the file watcher (auto-rescan on save)
demiourgos watch --config .demiourgos.json
```

### Docker Runtime Sanity

Run the backend stack with:

```bash
docker compose up -d --build
```

Avoid `docker compose run app ...` for long-running backend services. It creates an extra one-off app container and can make it look like two backend apps are running.

Quick duplicate check:

```bash
docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}' | grep demiourgos
```

### Docker (FalkorDB)

```bash
docker run -d -p 6379:6379 falkordb/falkordb:latest
```

---

## 12. Configuration

All configuration is in `demiourgos_config.yaml`:

```yaml
# Which directories to scan
include_directories:
  - "**/*"           # Default: scan everything

exclude_directories:
  - "node_modules"
  - ".venv"
  - "__pycache__"
  - ".git"

# FalkorDB connection
redis_host: "localhost"
redis_port: 6379

# Graph identity
graph_name: "my_project"
```

### Monorepo Support

For monorepos, use `include_directories` to scope the scan to specific services:

```yaml
include_directories:
  - "services/auth/**/*"
  - "services/payment/**/*"
  - "shared/models/**/*"

exclude_directories:
  - "services/legacy/**/*"
```

---

## License

This project is under active development.

---

*Built with Tree-sitter, FalkorDB, Python, and a deep conviction that AI agents deserve structural truth, not token-wasting guesswork.*
