{% extends "base.html" %} {% set active_page = 'how-it-works' %} {% block title %}How It Works — Maude Front Desk{% endblock %} {% block content %}

The Big Idea

Imagine you have 16 pets. Each one needs food, water, and a check-up every few minutes. You could do it all yourself — but what if each pet could feed itself, call the vet when it's sick, and remember what medicine worked last time?

That's what this system does. Every service (we call them Rooms) watches itself, fixes itself, and learns from every problem it solves. No human needed.

Meet the Cast

Seven characters work together inside every Room. Each one has one job:

S

The Service

The actual program doing real work — Grafana, PostgreSQL, the PLC collector. It doesn't know it's being watched.

H

Health Loop

A timer that fires every 60 seconds. No AI — just rules. "Is it running? Can it respond? Memory OK?" It can restart things, but it can't think.

R

Room Agent

The brain. An AI model (Ollama, running on a GPU server) that can read logs, check disk, call tools, and reason about what's wrong. It doesn't just restart — it investigates.

T

Tool Registry

The toolbox. When the AI says "check the logs", the Tool Registry translates that into an actual command and runs it. It's the AI's hands.

M

Memory Store

Three-tier memory: sticky notes (.md files), a database (PostgreSQL), and vector search (Qdrant). Every event gets recorded so the system can find "what fixed this last time?"

C

Claude (T4 Backup)

The specialist. Only called when Ollama can't figure it out. Lives on the control plane, never on the shop floor. Last resort.

L

Training Loop

The coach. Every few hours, it harvests all past conversations, fine-tunes the AI model on real problems, and deploys the smarter version fleet-wide.

The Runtime Loop

Follow one problem from detection to resolution. Every box shows who does it and what happens next.

Health Loop

Timer Fires (Every 60s)

Five checks run on the service:

systemctl is-active? HTTP health endpoint? Memory < 90%? Disk < 80%? Error count < 10?
Health Loop

Decision Point

All 5 pass Log "healthy", send heartbeat, sleep 60s
Service down / endpoint fail Try restart (max 3/hour, 10min cooldown)
Restart rate-limited or won't help Escalate to Room Agent
Upstream dependency down Skip restart, log upstream issue
escalation triggered
Health Loop

Check Past Fixes

Before calling the AI, the Health Loop searches Qdrant for similar past problems. If a match is found, the old fix is attached as a hint.

trigger + context + past fix
Room Agent

The AI Wakes Up

Four things happen before it thinks:

a Load knowledge — .md files from git
b Recent activity — last 10 events from PostgreSQL
c Similar situations — vector search in Qdrant
d Build prompt — everything above + trigger → sent to Ollama
prompt sent to Ollama (GPU)
Room Agent + Ollama

The Tool-Use Conversation

The AI and tools have a back-and-forth conversation:

Ollama "Let me check the service health first."
Tool service_health → inactive, down 2 min
Ollama "Down. Let me check the logs."
Tool service_logs → ERROR: disk full, cannot write WAL
Ollama "Disk full. Restarting." → service_restart → success
Up to 10 rounds. When done, it outputs <summary> and <outcome> tags.
can't figure it out?
Claude (T4)

T4 Escalation

If Ollama gives up, the entire conversation — all tool calls and results — gets handed to Claude. It sees everything Ollama tried, then continues where it left off with the same tools.

This is rare. Claude is the safety net — never on the shop floor, only on the control plane.

outcome determined
Memory Store

Remember Everything

Win or lose, every run gets stored to all three tiers:

T1 Knowledge files — .md updates pushed to Gitea
T2 PostgreSQL — full row: trigger, actions, tools, outcome, model
T3 Qdrant — vector embedding with root cause and tools used
event published
Event Publisher

Tell the Hotel

A PG NOTIFY event fires so every other Room can see what happened. This is how Rooms learn from each other without talking directly.

Next health check, the memory is already there — and the system is smarter.

The Training Loop

Remembering past fixes is good. But the AI itself gets better over time — past conversations become training data, and training data becomes a smarter model.

Training Loop

1. Harvest

Every 6 hours, the Training Loop queries PostgreSQL for new agent conversations. If 100+ new examples exist since the last run, the pipeline starts.

Training Loop

2. Clean & Export

Conversations are converted to ChatML format with quality filters:

Has tool calls In English Has summary tags Not trivial
SFTP to GPU
GPU Server

3. Fine-Tune

QLoRA training on the GPU — the base model (Qwen2.5-7B, 4-bit) stays frozen while small adapter layers (0.53% of total params) learn from every conversation. After 3 epochs, the adapters merge back into one clean model.

Training Loop

4. Deploy

The merged model is loaded into Ollama on both GPU servers (sparky + sparked, active-active) with the system prompt baked in.

Model Manager

5. Rebuild Room Models

Each Room gets a custom Modelfile — same fine-tuned base, but with its own system prompt, knowledge, and domain specialization. 14 Room models rebuilt on the new foundation.

Every Room is now running a smarter model — trained on real problems from real Rooms.

What Triggers What

Event Who What Happens
60s timer tick asyncio Health Loop runs 5 checks
Check fails Health Loop Restart (if rate limit allows)
Restart fails / rate-limited Health Loop Escalate to Room Agent
Escalation fires base.py Room Agent runs with full context
Ollama returns tool_calls Room Agent Execute tools, feed results back
Ollama says "escalated" Room Agent Hand off to Claude (T4)
Agent loop ends Room Agent Store to 3 memory tiers, publish event
6h training timer Training Loop Harvest, train, deploy if threshold met
Scheduled cron asyncio Proactive health check (no escalation)

Three Layers of Memory

Tier 1

Knowledge Files

Simple text files the Room keeps about itself. Pushed to Gitea via git after every significant event.

Like sticky notes on your monitor.
Tier 2

PostgreSQL

Structured log of everything — timestamped, categorized, queryable. Every health check, restart, and agent run gets a row.

Like a detailed diary with dates and categories.
Tier 3

Qdrant Vectors

Memories turned into math so the system can find similar problems, not just exact matches — even if the words are different.

Like recognizing a song even when someone hums it differently.

By The Numbers

16
Autonomous Rooms
4
Escalation Tiers
3
Memory Layers
60s
Check Interval
7B
Model Parameters
0.53%
Trained via LoRA
2
GPU Servers
6h
Training Cycle

The Bottom Line

Two feedback loops. The fast loop remembers every fix instantly — next time a similar problem appears, the answer is already there. The slow loop fine-tunes the AI itself every few hours, so it gets better at problems it's never seen before.

It doesn't just heal — it evolves.

{% endblock %}