Internal Architecture · Engineering Only

How AgentIQ Works — Full Detail

Every component, data path, table, and background loop. Do not share externally.

Your
agent
Developer's Agent Code Any Framework
agentiq.trace()
Called once per agent step. Passes: session_id, user_input, agent_response, response_time_ms, workflow_step
@watch decorator
Wraps any function. Captures input, output, and latency automatically. Generates session IDs.
fire and forget · returns in <1ms · never raises · never blocks
SDK
AgentIQ SDK sdk/client.py · agentiq/tracer.py
Thread-safe Queue
Max 500 events. put_nowait() — drops silently if full. Backpressure never reaches your agent.
Background Worker Thread
Daemon thread. Dequeues events and dispatches to DB write and HTTP send in parallel.
Local fallback log
agentiq_traces.log written on every event. Backup if DB and API are both unreachable.
two parallel paths — direct DB write + HTTP send
API &
Database
FastAPI api/main.py
POST /trace
Writes AgentLog row. Returns 201 immediately. Auth: AGENTIQ_API_KEY header.
GET /agents/{id}/patterns
Returns LossPattern rows for one agent. Scoped to agent_id — multi-tenant isolation.
PostgreSQL 4 tables only
AgentLog
Written by SDK. One row per agent step. Fields: session_id, step_number, input, output, tool_calls, latency_ms.
EvalResult
Written by judge. Fields: accuracy, goal_alignment, decision_quality, completeness, overall_score, passed, failure_type, failure_reason.
LossPattern
Written by analyzer. Fields: pattern_type, pattern_value, failure_count, pct_of_all_failures, root_cause, is_worsening.
AgentQualityConfig
Written by dashboard. Fields: industry, pass_threshold, weight_accuracy, weight_goal_alignment, weight_decision, weight_completeness.
background loops — never in the trace path
Background
loops
Eval Loop Every 30 seconds
LLM Judge agentiq/judge.py
Fetches up to 20 unevaluated AgentLog rows per agent. Scores each on 4 dimensions: accuracy · goal_alignment · decision_quality · completeness
Scoring: overall_score = weighted average. Default weights: accuracy 35% · goal_alignment 35% · decision 15% · completeness 15%. pass_threshold = 0.70.
Failure taxonomy:
wrong_answer tool_failure goal_drift incomplete hallucination context_loss loop
On bad JSON: retry once with simplified prompt. On 2nd failure: eval_error=True.
Analysis Loop Every 60 minutes
Pattern Analyzer agentiq/analyzer.py
Scans EvalResult rows where passed=False · eval_error=False for the last 7 days.
Groups failures by:
workflow_step (which step number)
failure_type (e.g. tool_failure)
Minimum threshold: 5 failures required before a LossPattern row is written.
root_cause: most common failure_reason string in the group. One plain English sentence.
is_worsening: current 7-day failure rate vs prior 7-day window.
dashboard reads from all 4 tables
Dashboard
Streamlit Dashboard agentiq/dashboard.py · 3 views
View 1 — Agent Overview
Quality score trend (7 days) · Today's pass rate · Active failure count · Top 3 failure types · Latency p50/p95
View 2 — Failure Feed
Auto-refresh every 30s. One-off failures appear immediately. Repeating patterns grouped with root_cause. Searchable, CSV export.
View 3 — Trace Viewer
Step-by-step waterfall with latency bars. Per-step scores. Failure reason in plain English. Fix recommendation.
Config Panel — writes to AgentQualityConfig
Adjust dimension weights per agent · Set pass_threshold · Choose industry preset (healthcare · retail · coding · marketing)
01
When does the developer see results?
t = 0s
Developer calls trace() → event queued → DB written → HTTP sent. Returns to caller in under 1ms.
t = 30s
Eval loop fires → Claude Haiku scores new AgentLog rows → EvalResult written → quality scores appear in dashboard.
t = 60 min
Analysis loop fires → run_all(db) → LossPattern rows written (if ≥5 failures detected) → loss patterns appear in Failure Feed.
02
Non-negotiable constraints
RuleWhere enforcedWhy
trace() never raisessdk/client.py — entire body in try/exceptAgentIQ bugs must never take down the developer's agent
trace() returns in <1msput_nowait() into queue — no network waitSame reason
Queue drops silently if fullQueue(maxsize=500) + put_nowait()Backpressure must never reach the agent
Eval never runs in trace pathBackground loop only — 30s after traceLLM calls are 200–800ms — cannot be in the hot path
failure_reason = one English sentenceJudge prompt — explicitly requiredRaw JSON must never reach dashboard or API clients
All SQL in db/queries.pyConvention + code reviewSingle place to audit, optimize, and test all queries
Every query scoped to agent_idAll routes + queries filter by agent_idMulti-tenant — one customer cannot see another's data