{% extends "base.html" %} {% set active_page = 'how-it-works' %} {% block title %}How It Works — Maude Front Desk{% endblock %} {% block content %}
Every Room takes care of itself. Here's how — start to finish.
Imagine you have 16 pets. Each one needs food, water, and a check-up every few minutes. You could do it all yourself — but what if each pet could feed itself, call the vet when it's sick, and remember what medicine worked last time?
That's what this system does. Every service (we call them Rooms) watches itself, fixes itself, and learns from every problem it solves. No human needed.
Seven characters work together inside every Room. Each one has one job:
The actual program doing real work — Grafana, PostgreSQL, the PLC collector. It doesn't know it's being watched.
A timer that fires every 60 seconds. No AI — just rules. "Is it running? Can it respond? Memory OK?" It can restart things, but it can't think.
The brain. An AI model (Ollama, running on a GPU server) that can read logs, check disk, call tools, and reason about what's wrong. It doesn't just restart — it investigates.
The toolbox. When the AI says "check the logs", the Tool Registry translates that into an actual command and runs it. It's the AI's hands.
Three-tier memory: sticky notes (.md files), a database (PostgreSQL), and vector search (Qdrant). Every event gets recorded so the system can find "what fixed this last time?"
The specialist. Only called when Ollama can't figure it out. Lives on the control plane, never on the shop floor. Last resort.
The coach. Every few hours, it harvests all past conversations, fine-tunes the AI model on real problems, and deploys the smarter version fleet-wide.
Follow one problem from detection to resolution. Every box shows who does it and what happens next.
Five checks run on the service:
Before calling the AI, the Health Loop searches Qdrant for similar past problems. If a match is found, the old fix is attached as a hint.
Four things happen before it thinks:
The AI and tools have a back-and-forth conversation:
<summary> and <outcome> tags.
If Ollama gives up, the entire conversation — all tool calls and results — gets handed to Claude. It sees everything Ollama tried, then continues where it left off with the same tools.
This is rare. Claude is the safety net — never on the shop floor, only on the control plane.
Win or lose, every run gets stored to all three tiers:
A PG NOTIFY event fires so every other Room can see what happened. This is how Rooms learn from each other without talking directly.
Remembering past fixes is good. But the AI itself gets better over time — past conversations become training data, and training data becomes a smarter model.
Every 6 hours, the Training Loop queries PostgreSQL for new agent conversations. If 100+ new examples exist since the last run, the pipeline starts.
Conversations are converted to ChatML format with quality filters:
QLoRA training on the GPU — the base model (Qwen2.5-7B, 4-bit) stays frozen while small adapter layers (0.53% of total params) learn from every conversation. After 3 epochs, the adapters merge back into one clean model.
The merged model is loaded into Ollama on both GPU servers (sparky + sparked, active-active) with the system prompt baked in.
Each Room gets a custom Modelfile — same fine-tuned base, but with its own system prompt, knowledge, and domain specialization. 14 Room models rebuilt on the new foundation.
Simple text files the Room keeps about itself. Pushed to Gitea via git after every significant event.
Structured log of everything — timestamped, categorized, queryable. Every health check, restart, and agent run gets a row.
Memories turned into math so the system can find similar problems, not just exact matches — even if the words are different.
Two feedback loops. The fast loop remembers every fix instantly — next time a similar problem appears, the answer is already there. The slow loop fine-tunes the AI itself every few hours, so it gets better at problems it's never seen before.
It doesn't just heal — it evolves.