Metadata-Version: 2.4
Name: runneriq
Version: 6.1.0
Summary: AI-powered CI/CD runner intelligence for GitLab — priority-aware routing with carbon-conscious scheduling
Author-email: Md Asif Iqbal <asifdotpy@users.noreply.gitlab.com>
License: MIT
Project-URL: Homepage, https://gitlab.com/gitlab-ai-hackathon/participants/11553323
Project-URL: Repository, https://gitlab.com/gitlab-ai-hackathon/participants/11553323
Project-URL: Issue Tracker, https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/issues
Project-URL: Documentation, https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/wikis/home
Keywords: gitlab,ci-cd,runners,ai,carbon-aware,scheduling
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests==2.31.0
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: pyyaml==6.0.1
Requires-Dist: flask==3.0.2
Requires-Dist: anthropic==0.39.0
Requires-Dist: jsonschema==4.21.1
Requires-Dist: httpx>=0.27.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: tenacity>=8.2.0
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: pytest==8.0.0; extra == "dev"
Requires-Dist: pytest-cov==4.1.0; extra == "dev"
Requires-Dist: responses==0.24.1; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: bandit; extra == "dev"
Dynamic: license-file

<div align="center">

# 🎯 RunnerIQ

**Less noise. More signal. Zero alert fatigue.**

*Most decisions are instant. AI handles the hard ones. Advisory mode lets teams build trust before granting autonomy.*

[![pipeline status](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/badges/main/pipeline.svg)](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/pipelines)
[![coverage report](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/badges/main/coverage.svg)](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/commits/main)
[![PyPI](https://img.shields.io/pypi/v/runneriq)](https://pypi.org/project/runneriq/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![version](https://img.shields.io/badge/version-v6.1.0-blue)]()
[![tests](https://img.shields.io/badge/tests-1%2C171%2B%20passing-brightgreen.svg)]()
[![GitLab Duo](https://img.shields.io/badge/GitLab-Duo%20Agent%20Platform-orange.svg)](https://about.gitlab.com/)

> **1,171+ tests** · **8 focused modules** · **1 unified flow** · **Carbon-aware routing**
> Addresses [GitLab's 10-year-old runner scheduling issue](https://gitlab.com/gitlab-org/gitlab/-/issues/14976) (1,008+ comments)

[Wiki](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/wikis/home) · [Architecture](ARCHITECTURE.md) · [Contributing](CONTRIBUTING.md) · [Timeline](TIMELINE.md)

</div>

---

## What is RunnerIQ?

**Less noise. More signal. Zero alert fatigue.**

RunnerIQ is an intelligent CI/CD operations layer that filters the noise so your team only sees what matters. Built on GitLab's Duo Agent Platform.

### The Problem
- 🔴 Pipeline fails at 2 AM → nobody notices until standup
- 🔴 10 "lint failed" alerts flood Slack → real failures get buried
- 🔴 Flaky tests trigger investigations → 30 min wasted, it was transient
- 🔴 Every failure looks the same → no severity, no context, no routing

### How RunnerIQ Fixes It

**You run one command. RunnerIQ handles the rest.**

1. Noisy alerts → RunnerIQ filters flaky tests, groups duplicates, routes by severity
2. Pipeline fails → RunnerIQ diagnoses the root cause in ~20 seconds
3. Runner selection → RunnerIQ recommends the optimal runner with carbon cost comparison

Most decisions are instant and free. AI kicks in only when genuine reasoning is needed — failure triage, anomaly explanation, carbon-aware trade-offs.

> 💡 **Under the hood:** 8 focused modules, a rules-first scoring engine, and a 4-stage noise reduction pipeline. [See Architecture →](#architecture)

## Key Differentiators

- ✅ **Noise reduction first** — 4-stage pipeline filters flaky tests, groups duplicates, routes by severity before anything reaches your team
- ✅ **Pipeline failure autopilot** — AI-powered root cause analysis in ~20 seconds, posted directly on your issue or MR
- ✅ **Instant decisions, zero API cost** — 85% of routing decisions are deterministic rules (<100ms, $0)
- ✅ **AI for the 15% that need real thinking** — Claude handles genuine toss-ups, failure triage, and carbon trade-offs
- ✅ **Starts as recommendations. Earns trust over time.** — Advisory → Supervised → Autonomous
- ✅ **Non-blocking by design** — RunnerIQ never replaces GitLab's scheduler, so there's nothing to fail over
- ✅ **Bonus: carbon-aware routing** — Real-time electricity grid data makes your CI/CD greener (MCP + Electricity Maps API)
- ✅ **Works WITH good tagging**, not instead of it
- ✅ **1,171+ tests passing** across 8 modules with 60%+ coverage enforced in CI
- ✅ **Intelligent Orchestration Flow** — AI-powered 4-module pipeline published to GitLab Duo Agent Platform

---

## 🔇 Noise Reduction — Built-In, Not Bolted On

RunnerIQ's alerting pipeline filters noise at 4 levels before anything reaches your team:

| Stage | Module | What It Does | Noise Reduced |
|-------|--------|-------------|---------------|
| 1 | `FlakyDetector` | Detects fail→pass retry patterns | ~30% of false alerts |
| 2 | `SuppressionEngine` | Rule-based filtering (allow_failure, experimental branches) | ~25% more |
| 3 | `AlertGrouper` | Batches similar alerts in 15-min windows | 10 alerts → 1 notification |
| 4 | `NotificationRouter` | Routes by severity with cooldown dedup | Right channel, right time |

**Result:** Only actionable alerts reach your team. Everything else is logged, grouped, or digested.

---

## 🚨 Pipeline Failure Autopilot

> *"Save the Claude calls for things that actually need natural language reasoning like incident triage or root cause analysis."*
> — Useful-Process9033, r/devops

**That's exactly what we built.**

When your pipeline breaks, AI reads the job logs, fetches recent commit diffs, correlates error messages with code changes, classifies the failure type, and posts a structured diagnosis report directly on your issue or MR. No human needs to read 500 lines of log output. Verified live in [Session #3021351](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/automate/agent-sessions/3021351).

> **v4.6.0 breakthrough:** The agent now posts structured Markdown diagnosis reports **directly inline on issues/MRs** and **autonomously creates follow-up tasks** with labels when it identifies gaps — demonstrated live in [Task #193](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/work_items/193), which was created by the agent itself.

| Scenario | Handler | Latency | Cost |
|----------|---------|---------|------|
| Pipeline failed with cryptic error | **AI** (Autopilot) | ~20s | ~$0.01 |
| Classify failure: config error vs dependency issue | **AI** (Autopilot) | ~20s | ~$0.01 |
| "Is this a flaky test or a real regression?" | **AI** (Autopilot) | ~20s | ~$0.01 |
| Job duration spiked 3x — expected or anomaly? | **AI** (context) | ~2-3s | ~$0.003 |

---

## 🏃 Intelligent Runner Recommendations

> *"If GitLab would prioritize jobs on protected branches I'd be so happy"*
> — SchlaWiener4711, r/devops

RunnerIQ scores each runner on speed, fit, capacity, and carbon cost. When the top-2 runners score within 15%, AI breaks the tie by weighing carbon intensity, historical reliability, and workload patterns.

The recommendation tells your team: *"For this deploy job, runner-docker-large in FR (58 gCO₂/kWh) is optimal — 75% capacity available, exact tag match, and 83% lower carbon than runner-docker-medium in DE (340 gCO₂/kWh)."*

| Scenario | Handler | Latency | Cost |
|----------|---------|---------|------|
| Runner A: 20% load, Runner B: 90% load | **Rules engine** | <100ms | $0 |
| Standard deploy to tagged production runner | **Rules engine** | <100ms | $0 |
| Two runners score within 15% of each other | **AI** (toss-up) | ~2-3s | ~$0.003 |
| "Which runner minimizes CO₂ for this lint job?" | **AI** (carbon MCP) | ~2-3s | ~$0.003 |
| "Why did RunnerIQ recommend runner-gpu-2?" | **AI** (explain) | ~2-3s | ~$0.003 |

#### The 10-Year Problem, Solved

[GitLab #14976](https://gitlab.com/gitlab-org/gitlab/-/issues/14976) asked for runner priority. 1,008+ comments later, no solution exists. RunnerIQ delivers:

- **Priority scoring** — Production deploys score higher than lint jobs (configurable YAML rules)
- **Protected branch boost** — Jobs on `main`/`production` automatically get priority (exactly what SchlaWiener4711 asked for)
- **Intelligent recommendations** — Not just priority, but which specific runner is optimal and why
- **Failure diagnosis** — When pipelines break, AI-powered root cause analysis in ~20 seconds
- **Carbon awareness** — Every recommendation includes environmental impact data
- **Trust progression** — Starts as recommendations. Earns trust over time.

---

## 🛠️ CLI Tools

```bash
# Start monitoring — zero config needed
runneriq run

# Health check — verify your setup (5 checks: GitLab API, Anthropic, carbon, config, tests)
runneriq doctor

# Explain why a job was assigned to a specific runner
runneriq explain <job_id>

# View the audit trail of all decisions
runneriq audit

# Emergency: remove all RunnerIQ-managed tags
runneriq reset-tags
```

---

## 🌱 Bonus: Carbon-Aware Routing

> *Competing for the Eco-Friendly Agents prize ($3K)*

RunnerIQ includes real-time carbon intensity data from [Electricity Maps API](https://www.electricitymaps.com/), enabling carbon-aware runner selection. This is a **bonus capability** — the core value is noise reduction and failure diagnosis.

- 4 MCP tools for carbon data (zone intensity, forecast, optimal window, comparison)
- CO₂ savings tracker: FIFO vs intelligent routing comparison
- Priority-weighted carbon scoring in runner recommendations

**Why it's separate:** User research showed sustainability isn't yet a buying decision driver. We built it because it's technically interesting and prize-eligible, but it's not the reason you'd deploy RunnerIQ.

#### Priority-Based Carbon Weights

| Priority | Carbon Weight | Rationale |
|----------|:------------:|-----------:|
| **CRITICAL** | 5% | Speed is everything. Carbon is a tiebreaker only. |
| **HIGH** | 20% | Prefer green runner if <10% slower. |
| **MEDIUM** | 35% | Accept up to 20% speed trade-off for green. |
| **LOW** | 50% | Carbon is the primary factor. Check forecast for deferral. |

#### MCP Carbon Tools

| Tool | Purpose | Called When |
|------|---------|------------|
| `get_fleet_carbon_summary()` | Fleet-wide carbon ranking (greenest first) | Every tie-break decision (first call) |
| `estimate_job_carbon_cost()` | CO2 estimate per runner: `Power(kW) x Duration(h) x Intensity` | Top 2-3 candidate runners |
| `get_carbon_forecast()` | Forecast + deferral recommendation | LOW/MEDIUM jobs in high-carbon zones |
| `get_carbon_intensity_now()` | Real-time intensity for a single zone | On-demand lookups |

#### Carbon Metrics

| Metric | Description |
|--------|-------------|
| **CO2 saved today** | Grams saved vs. FIFO baseline |
| **Green routing rate** | % of jobs routed to low-carbon runners |
| **Jobs deferred** | Jobs shifted to cleaner electricity windows |
| **Carbon per pipeline** | CO2 footprint breakdown by runner/zone |
| **Fleet avg intensity** | Weighted average gCO2eq/kWh across fleet |

#### Carbon Dashboard

A single-file HTML dashboard at `localhost:PORT/dashboard` with 3 screens:
- **Fleet Map**: Runner cards with carbon intensity badges (🟢/🟡/🔴) and utilization
- **Savings Tracker**: CO2 saved today/week, green routing rate, 30-day trend
- **24h Forecast Heatmap**: Runners x hours grid, best batch job windows highlighted

#### Carbon Quick Start

```bash
# Option 1: Demo mode (no API key needed)
export CARBON_DEMO_MODE=true
python -m runneriq
# Open localhost:PORT/dashboard

# Option 2: Live data (free Electricity Maps token)
export ELECTRICITY_MAPS_TOKEN=your_token_here
export RUNNER_ZONE_runner_1=DE          # Germany
export RUNNER_ZONE_runner_2=DK-DK1      # Denmark (wind-heavy)
export RUNNER_ZONE_runner_3=FR           # France (nuclear, low carbon)
export RUNNER_ZONE_runner_4=US-CAL-CISO  # California (solar peaks midday)
python -m runneriq
```

Demo mode uses hardcoded intensities that show dramatic contrast: DE=340 (red), FR=58 (green), DK-DK1=95 (green), US-CAL-CISO=210 (amber).

#### Carbon Source Files

| File | Description |
|------|-------------|
| `src/carbon/models.py` | 6 dataclasses: CarbonIntensity, CarbonForecast, DeferDecision, etc. |
| `src/carbon/electricity_maps_client.py` | API client with caching, retry, triple fallback, demo mode |
| `src/carbon/mcp_server.py` | CarbonMCPTools: 4 tools + Anthropic tool definitions |
| `src/carbon/co2_tracker.py` | CO2SavingsTracker with file persistence |
| `src/carbon/settings.py` | All carbon env vars + demo config |
| `src/carbon/dashboard.py` | Flask blueprint: 4 API endpoints + HTML serving |
| `src/carbon/dashboard.html` | Self-contained dashboard (dark theme, auto-refresh) |

---

## Getting Started

RunnerIQ works out of the box with sensible defaults:
- **Zero config needed** — `runneriq run` starts monitoring immediately
- **Advisory mode by default** — recommends, never acts without permission
- **Customize when ready** — YAML config for priority rules, alert routing, suppression rules

Advanced features (carbon routing, custom scoring weights, PagerDuty integration) are available but never required.

---

> **For judges:** *"RunnerIQ starts with the #1 DevOps pain point: alert fatigue. Our 4-stage noise reduction pipeline filters flaky tests, deduplicates alerts, and routes by severity — before any AI is called. When pipelines actually break, the Autopilot diagnoses root cause in ~20 seconds. 85% of routing decisions are instant and free. Carbon-aware routing is built in. We built what the community asked for."*
>
> **For the critics:** *"You said 'save Claude for incident triage and root cause analysis.' We did. Our Pipeline Failure Autopilot reads job logs, correlates commits, classifies failures, and recommends fixes — in ~20 seconds. The rules engine handles scheduling. AI handles reasoning. And every recommendation includes the carbon cost of FIFO vs. intelligent routing."*
>
> **For enterprise users:** *"Predictable costs. Instant decisions, zero API cost for 85% of routing. AI-powered failure diagnosis. Carbon impact tracking. Full audit trail. Advisory by default — your team stays in control."*

---

## 📊 Simulated Impact Report

> *The following projections use realistic fleet parameters. Actual results depend on fleet size, job mix, and runner configuration.*

#### Scenario: Mid-Size Team (10 Runners, ~200 Jobs/Day)

| Metric | Without RunnerIQ | With RunnerIQ | Improvement |
|--------|-----------------|---------------|-------------|
| **Avg. job queue wait** | ~45s | ~12s | **73% reduction** |
| **Runner idle time** | ~35% | ~12% | **66% reduction** |
| **Failed job retries** (wrong runner) | ~8/day | ~1/day | **87% reduction** |
| **Carbon per pipeline** | 2.0 gCO₂e | 1.4 gCO₂e | **30% reduction** |
| **Monthly carbon savings** | — | ~120 gCO₂e | **~1.4 kgCO₂e/year** |

#### How the Savings Break Down

1. **Queue optimization** — Jobs are matched to the *right* runner immediately, not round-robin'd to whatever's free
2. **Carbon-aware routing** — When two runners score within 15%, AI picks the one in the lower-carbon region/time-zone
3. **Failure prevention** — Tag matching + capacity checks prevent "job assigned to incompatible runner" failures
4. **Idle reduction** — Workload balancing spreads jobs across the fleet instead of overloading hot runners

---

## 📊 Project Stats

| Metric | Value |
|--------|-------|
| **Tests** | 1,165+ passing |
| **Modules** | 5 focused modules (monitor, analyzer, assigner, optimizer, alerting) |
| **Merged MRs** | 135+ |
| **Agent Tools** | 10 (P0 + P1 + P2) |
| **CLI Commands** | 5 (`run`, `doctor`, `explain`, `audit`, `reset-tags`) |
| **MCP Tools** | 4 (carbon routing) |
| **Decision Split** | ~85% instant (<100ms) / ~15% AI (~2-3s) |
| **Language** | Python 3.10+ (96.4%) |
| **Carbon Data** | Real-time via Electricity Maps API |

---

## Architecture

The architecture diagram and decision flow below show the full technical picture. For most users, the [Getting Started](#getting-started) section above is all you need.

```mermaid
flowchart LR
    Fail["Pipeline fails"] --> Diag["🧠 AI diagnoses\n~20s, 5 tools"]
    Diag --> Report["Structured report\nposted on Issue/MR"]

    Job["Job needs\nrunner"] --> Score["Score all\ncompatible runners"]
    Score --> Gap{"Top-2 margin\n> 15%?"}
    Gap -- "Yes (85%)" --> Rules["✅ Rules recommend\n< 100ms, $0"]
    Gap -- "No (15%)" --> Claude["🧠 AI reasons\n~2-3s, with carbon"]
    Rules --> Rec["Advisory recommendation\n+ carbon comparison"]
    Claude --> Rec
    Rec --> Team["Team decides\n(or auto-apply\nwith --execute)"]

    style Rules fill:#dcfce7,stroke:#22c55e
    style Claude fill:#fef3c7,stroke:#f59e0b
    style Diag fill:#fef3c7,stroke:#f59e0b
    style Report fill:#dbeafe,stroke:#3b82f6
```

**Pipeline Failure Autopilot** (top flow): When a pipeline fails, AI analyzes failing jobs, reads log traces, correlates with recent commits, and posts a structured diagnosis report directly on the issue or MR. Triggerable from any comment via `@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon`.

**Intelligent Runner Routing** (bottom flow): For job assignment, the rules engine handles 85% of decisions instantly. AI is called only for genuine toss-ups (runners within 15% margin), where it weighs carbon intensity, historical reliability, and workload patterns.

<details>
<summary>Full system architecture diagram</summary>

```mermaid
flowchart TB
    Pipeline["🔄 GitLab CI/CD Pipeline"] --> RunnerIQ

    subgraph RunnerIQ["RunnerIQ (Non-Blocking Layer)"]
        direction TB
        FC["FlowController + RunContext"]
        FC --> A1["🔍 Module 1: Monitor\nTrack runner fleet"]
        A1 --> A2["📊 Module 2: Analyzer\nScore jobs 0-100"]
        A2 --> A3["🎯 Module 3: Assigner\nRules 85% + AI 15%"]
        A3 --> A4["⚡ Module 4: Optimizer\nPerformance + Carbon"]
    end

    subgraph Orchestration["🆕 Orchestration Flow (v4.6.0)"]
        Inline["→ Posts inline reports on issues"]
        AutoTask["→ Auto-creates follow-up tasks"]
    end

    A4 --> Orchestration
    RunnerIQ --> Fallback["If RunnerIQ is down → GitLab native FIFO takes over"]

    style RunnerIQ fill:#f0f4ff,stroke:#4a6cf7,stroke-width:2px
    style Orchestration fill:#ecfdf5,stroke:#22c55e,stroke-width:2px
    style Fallback fill:#fef3c7,stroke:#f59e0b
    style Pipeline fill:#e0e7ff,stroke:#6366f1
```

</details>

<details>
<summary>Decision flow (Module 3: Smart Assigner)</summary>

```mermaid
flowchart LR
    Job["Job arrives"] --> Score["Score all\ncompatible runners"]
    Score --> Check{"How many\nrunners?"}
    Check -- "0" --> Queue["Queue job"]
    Check -- "1" --> Direct["Direct assign\n< 10ms"]
    Check -- "2+" --> Margin{"Top-2 margin\n> 15%?"}
    Margin -- "Yes" --> Rules["Rules assign\n< 100ms"]
    Margin -- "No" --> Budget{"Token budget\navailable?"}
    Budget -- "Yes" --> Claude["AI\n~2-3s"]
    Budget -- "No" --> Rules

    style Claude fill:#fef3c7,stroke:#f59e0b
    style Rules fill:#dcfce7,stroke:#22c55e
    style Direct fill:#dcfce7,stroke:#22c55e
```

</details>

<details>
<summary>Scoring algorithm</summary>

```text
TOTAL_SCORE = CAPACITY(30%) + TAG_MATCH(25%) + CARBON(25%) + HISTORY(20%)

If top runner leads by >15% → Rules assign instantly ($0, <100ms)
If within 15% margin → AI breaks tie with context (~$0.003, ~2-3s)
```

</details>

<details>
<summary>Target metrics</summary>

| Metric | Target |
|--------|--------|
| Routing recommendations by rules | 85–90% |
| Routing recommendations by AI | 10–15% |
| Rules recommendation latency | < 100ms |
| AI recommendation latency | < 3s |
| Pipeline diagnosis latency | < 30s |
| Daily AI API cost | ~$0.50–$1.50 |
| Carbon savings vs. FIFO baseline | Tracked per job |

</details>

#### The Carbon Argument: FIFO vs. Intelligent Routing

GitLab's FIFO scheduler doesn't consider where runners are located or what the local electricity grid looks like. It picks the first available runner, regardless of carbon intensity.

RunnerIQ recommends the runner that balances performance AND carbon:

```
FIFO picks randomly:
  Job → runner-DE (340 gCO₂/kWh) — coal-heavy grid
  Cost: 0.5 kWh × 340 = 170g CO₂

RunnerIQ recommends:
  Job → runner-FR (58 gCO₂/kWh) — nuclear, low carbon
  Cost: 0.5 kWh × 58 = 29g CO₂
  Savings: 141g CO₂ per job (83% reduction)
```

Multiply by hundreds of jobs per day across a fleet, and the impact is significant. RunnerIQ doesn't force the routing — it **shows your team the carbon cost of each option** and recommends the greener path.

---

## The 5 Modules

#### Module 1: Runner Monitor 🔍

Polls the GitLab Runner API every 30 seconds. Maintains real-time state for every runner: status (online/offline/paused), active jobs, capacity, tags, and utilization. Detects state changes (runners going offline, stuck jobs >30min) and outputs structured JSON for downstream modules. Uses per-endpoint caching with stale-cache fallback on API errors.

**Source:** `src/agent1_monitor/` — `gitlab_client.py`, `runner_monitor.py`, `main.py`

#### Module 2: Job Analyzer 📊

Extracts job metadata from pipelines and calculates a **priority score (0-100)** using configurable YAML rules. Scores are a weighted combination of branch priority (main=100, feature=50), user role (maintainer=100, guest=25), and job type (deploy=100, lint=40). Classifies urgency as CRITICAL/HIGH/MEDIUM/LOW. Supports bonuses (manual trigger +10, retry +5), penalties (allow_failure -10), and SLA escalation (LOW jobs auto-promote to MEDIUM after 5 minutes). Non-production branches are capped at 75.

**Source:** `src/agent2_analyzer/` — `job_analyzer.py`, `priority.py`, `history.py`, `priority_config.yaml`

#### Module 3: Smart Assigner 🧠

The AI decision engine. Receives runner states from Module 1 and prioritized jobs from Module 2. Scores each compatible runner on a 0-100 scale using four weighted factors:

| Factor | Weight | Description |
|--------|--------|-------------|
| Inverse utilization | 40% | Idle runners score higher |
| Tag match quality | 20% | Exact match = 100, superset = partial |
| Capacity headroom | 20% | `(max - active) / max × 100` |
| Historical performance | 20% | Duration ratio vs. fleet average |

When the top-2 runners score within 15%, AI is called for nuanced trade-off analysis. A `TokenBudgetTracker` enforces a daily cap (default 50K tokens/day) with automatic fallback to rules.

**Trust model:** Advisory (default) → Supervised → Autonomous. Starts as recommendations. Earns trust over time. Every decision produces an immutable `AuditEntry`. Anomaly detection flags CRITICAL jobs on overloaded runners.

**Source:** `src/agent3_assigner/` — `smart_assigner.py`, `runner_scorer.py`, `claude_client.py`, `trust_model.py`, `hybrid_engine.py`, `priority_queue.py`

#### Module 4: Performance Optimizer 📈

Tracks historical metrics per runner and generates weekly Markdown reports. Calculates a **composite performance score (0-100)** per runner:

| Component | Weight | Formula |
|-----------|--------|---------|
| Throughput | 25% | `log₂(jobs + 1) × 12`, capped at 100 |
| Speed | 30% | `(fleet_avg / runner_avg) × 50` |
| Reliability | 30% | `(1 - failure_rate) × 100` |
| Utilization | 15% | Bell curve, optimal at 50-80% |

Detects four issue types with actionable recommendations:

| Issue | Threshold | Severity | Action |
|-------|-----------|----------|--------|
| Slow runner | > 2× fleet avg duration | ⚠️ Warning | Upgrade or retire |
| Underutilized | < 20% utilization | ℹ️ Info | Consolidate or decommission |
| High failure rate | > 5% failure rate | 🔴 Critical | Investigate infrastructure |
| Bottleneck | > 90% utilization | ⚠️ Warning | Add parallel runner |

Weekly reports include: summary with week-over-week deltas, top performers, needs-attention with inline recommendations, cost analysis, and a runner details table.

**Source:** `src/agent4_optimizer/` — `optimizer.py`, `performance_scorer.py`, `metrics_collector.py`, `report_generator.py`, `models.py`

#### Module 5: Alerting 🔇

The noise reduction pipeline. Filters, groups, and routes alerts through 4 stages before anything reaches your team. FlakyDetector identifies fail→pass retry patterns (~30% false alert reduction). SuppressionEngine applies rule-based filtering for `allow_failure` jobs and experimental branches (~25% more). AlertGrouper batches similar alerts in configurable time windows (10 alerts → 1 notification). NotificationRouter delivers to the right channel with severity-based routing and cooldown deduplication.

**Source:** `src/alerting/` — `flaky_detector.py`, `alert_grouper.py`, `suppression_engine.py`, `notification_router.py`, `models.py`, `config_schema.py`

#### Module Tools (v2.1)

Each module now has expanded capabilities beyond core scheduling:

| Module | Core Tools | v2.1 Expansion | MR |
|--------|-----------|----------------|-----|
| **Module 1** (Monitor) | Runner status, capacity | `GetPipelineErrors`, `GetJobLogs`, `CiLinter` | !103, !106 |
| **Module 2** (Analyzer) | Job scoring, pipeline analysis | `GetMergeRequest`, `ListMergeRequestDiffs` | !106 |
| **Module 3** (Assigner) | Assignment, tag manipulation | `CreateIssue`, `CreateIssueNote`, `GitLabUserSearch` | !104, !107 |
| **Shared** | — | `GetProject`, `GetCurrentUser` | !105 |

**10 tools total, 32 tests.** All non-blocking — if any tool fails, the module falls back to existing behavior. Tools are standalone with constructor injection for easy testing and zero coupling to the core pipeline.

---

## Action Bridge: Advisory to Action

RunnerIQ defaults to **advisory mode** (recommend only). When you're ready, the Action Bridge lets it **influence job routing** by dynamically adding `runneriq:` prefixed tags to runners via the GitLab API.

```mermaid
flowchart LR
    A3["Module 3 decides:<br/>Job X to Runner B"] --> Check{"execute flag?"}
    Check -- "No (default)" --> Advisory["Log recommendation<br/>no changes"]
    Check -- "Yes (--execute)" --> Tag["TagManager.add_tag<br/>runneriq:preferred-X"]
    Tag --> TTL["Auto-revert<br/>after 5 min"]
    Tag --> A4["Module 4 reports:<br/>tags applied, success rate"]

    style Advisory fill:#dcfce7,stroke:#22c55e
    style Tag fill:#fef3c7,stroke:#f59e0b
```

#### Safety Guarantees

| Layer | Protection |
|-------|------------|
| **Dry-run default** | No tag changes unless `--execute` is passed |
| **Tag namespace** | Only manages `runneriq:` prefixed tags, never touches user-defined tags |
| **Auto-revert** | Every tag change has a TTL (default 5 min) and auto-reverts |
| **Kill switch** | `runneriq reset-tags` removes all RunnerIQ-managed tags instantly |
| **Audit trail** | Every action logged: runner, tag, reason, revert timestamp |
| **Graceful fallback** | If tag manipulation fails, falls back to advisory-only mode |

#### Usage

```bash
# Advisory mode (default) — recommend only, zero side effects
runneriq run

# Action mode — apply routing tags to runners via GitLab API
runneriq run --execute

# Emergency reset — remove ALL runneriq: tags from all runners
runneriq reset-tags

# View audit trail of all tag changes
runneriq audit
```

#### Trust Progression

**Advisory** (default) → **Supervised** (`--execute`, human reviews) → **Autonomous** (future: auto-execute after proven reliability)

Module 4's weekly report includes an **Action Bridge** section when `--execute` is used: tags applied, success rate, and per-job recovery events.

**Source:** `src/action_bridge/tag_manager.py` (17 tests), `src/agent3_assigner/smart_assigner.py` (wiring), `src/agent4_optimizer/optimizer.py` (reporting)

---

## 🔍 RunnerIQ Intelligent Orchestration — AI Flow

RunnerIQ includes an **Intelligent Orchestration** flow published to the [GitLab Duo Agent Platform](https://gitlab.com/gitlab-ai-hackathon/participants/11553323). It is the **single entry point** for all demo and submission interactions, combining pipeline diagnosis, job analysis, runner assignment, and performance optimization in one 4-module pipeline.

> **⚠️ Pivot note:** The original standalone diagnosis flow (`flows/diagnosis.yml` → `@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon`) encounters a `gRPC 16:Forbidden by auth provider` error due to hackathon platform auth constraints. The orchestration flow (`flows/runneriq.yml` → `@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon`) works correctly and includes all diagnosis capabilities plus the full 4-module pipeline. See [#189](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/issues/189) for details.

#### How It Works

```mermaid
graph LR
    A[Trigger from Issue/MR] --> B["Module 1: Monitor\nget_project, get_pipeline_errors,\nget_job_logs, ci_linter"]
    B --> C["Module 2: Analyzer\nget_merge_request,\nlist_merge_request_diffs"]
    C --> D["Module 3: Assigner\ncreate_issue, create_issue_note,\ngitlab_user_search"]
    D --> E["Module 4: Optimizer\nPerformance report"]
    E --> F[Structured Orchestration Report]
```

1. **Trigger** — Mention `@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon` in any issue or MR comment, or select "RunnerIQ Intelligent Orchestration" in Duo Chat
2. **Monitor** (Module 1) — Diagnoses pipeline failures, fetches failing jobs, reads log traces, validates CI config
3. **Analyze** (Module 2) — Scores and prioritizes pending jobs by branch type, stage, MR context, and duration
4. **Assign** (Module 3) — Routes jobs to optimal runners using rules-first + AI toss-up engine
5. **Optimize** (Module 4) — Generates performance report with fleet utilization, throughput, and recommendations

#### Quick Start

**From an Issue or MR comment:**
```
@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon Diagnose the latest pipeline failure for this project
```

**From Duo Chat:**
1. Open GitLab Duo Chat
2. Select "RunnerIQ Intelligent Orchestration" flow
3. Ask: "Diagnose the failing pipeline in project 79476480"

#### Tools Used (across 4 modules)

| Module | Tools | Purpose |
|--------|-------|---------|
| **Monitor** | `get_project`, `get_pipeline_errors`, `get_job_logs`, `ci_linter`, `get_current_user` | Pipeline diagnostics and CI validation |
| **Analyzer** | `get_merge_request`, `list_merge_request_diffs`, `get_project`, `get_current_user` | Job priority scoring with MR context |
| **Assigner** | `create_issue`, `create_issue_note`, `gitlab__user_search`, `get_project`, `get_current_user` | Runner assignment and team notification |
| **Optimizer** | `get_project`, `get_current_user` | Performance reporting |

#### Example Output

```
## RunnerIQ Orchestration Report
**Project:** RunnerIQ (ID: 79476480)
**Pipeline:** #345 — FAILED

### Module 1: Pipeline Diagnosis
- **Classification:** dependency issue
- **Root Cause:** pip-audit found CVE-2024-XXXX in requests==2.31.0
- **Recommendation:** Upgrade requests to >=2.32.0

### Module 2: Job Priority Analysis
- 3 pending jobs scored: deploy (95), test (60), lint (40)

### Module 3: Runner Assignment
- DECISION: rules_engine | runner=runner-fr-large | reason=exact tag match, 75% capacity, lowest carbon

### Module 4: Performance Summary
- Fleet utilization: 55% | Green routing rate: 72%
- Top recommendation: Consolidate underutilized runner-c-small
```

#### What the Agent Does Autonomously

| Capability | How It Works |
|-----------|-------------|
| **Inline diagnosis reports** | Posts full Markdown reports directly on issues/MRs |
| **Follow-up task creation** | Auto-creates labeled tasks when it identifies gaps |
| **Decision transparency** | Every recommendation includes score breakdown + reasoning |
| **Carbon comparison** | Shows CO₂ cost of FIFO vs. intelligent routing |

#### Platform Architecture Alignment

| Layer | GitLab Definition | RunnerIQ Implementation |
|-------|------------------|------------------------|
| **Tool** | Exposes data; no reasoning | GitLab REST API, Claude API, scoring engine |
| **Agent** | Autonomous task performer | 4 specialists (Monitor, Analyzer, Assigner, Optimizer) |
| **Flow** | Orchestrates agents | `FlowController` with `RunContext` shared state |

#### Flow Definition

The orchestration flow is defined in [`flows/runneriq.yml`](flows/runneriq.yml) and published as `@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon` on the GitLab Duo Agent Platform. It chains 4 modules (Monitor → Analyzer → Assigner → Optimizer) with context passing between each stage. Performance: ~30–90 second execution, stable WebSocket, tested across 20+ sessions.

---

## RunnerIQ vs Alternatives

Based on feedback from **8 DevOps engineers on r/devops**, here's how RunnerIQ compares to suggested alternatives:

| Approach | Solves Capacity? | Solves Priority? | Cost | Complexity | Best For |
|----------|:---------------:|:----------------:|:----:|:----------:|:---------:|
| GitLab native FIFO | ❌ | ❌ | Free | None | Single runner setups |
| Semantic tagging (`deploy,prod`) | Partial | ❌ | Free | Low | Small fleets with predictable workloads |
| Dedicated runner pools | ✅ | Partial | High (idle runners) | Medium | Teams with budget for dedicated infra |
| EKS + Karpenter autoscaling | ✅ | ❌ | Variable | High (K8s expertise) | Cloud-native teams |
| On-demand provisioning | ✅ | Partial | Medium | High | Teams with K8s/cloud infra skills |
| **RunnerIQ** | ❌ (fixed fleet) | ✅ | ~$5-10/mo | Low (Python + API key) | Fixed fleet teams wanting priority routing |
| **RunnerIQ v2.0 + Karpenter** | ✅ | ✅ | TBD | Medium | Cloud-native teams wanting cost-optimized scaling |

**Key insight:** Dedicated pools and autoscaling solve **capacity**. RunnerIQ solves **priority**. These are different problems. See the [full comparison table](docs/alternatives-comparison.md) for 11 dimensions across 8 approaches.

### RunnerIQ vs GitLab Runner Autoscaling

| Aspect | Autoscaling | RunnerIQ | Combined |
|--------|------------|----------|----------|
| **Problem solved** | Capacity — "Do I have enough runners?" | Intelligence — "Which runner gets which job?" | Both |
| **How it works** | Spins up/down runners based on demand | Scores and routes jobs to optimal existing runners | RunnerIQ routes intelligently within autoscaled fleet |
| **Priority handling** | ❌ FIFO within each tag pool | ✅ Priority scoring (0-100) based on branch, stage, context | ✅ |
| **Cost optimization** | Reduces idle runner costs | Reduces wasted compute by better matching | Both |
| **Setup** | Runner config (`docker-machine`, `fleeting`) | Standalone sidecar + API token | Independent |

**They're complementary, not competing.** Autoscaling ensures you have the right number of runners. RunnerIQ ensures each runner gets the right job. A production hotfix still waits behind lint checks in an autoscaled FIFO queue — RunnerIQ fixes that.

**Current limitation:** RunnerIQ has zero autoscaling awareness today. It treats all runners as static entities. Autoscaling-aware scheduling (detecting scale-up/down events, coordinating with `fleeting` or `docker-machine`) is on the v2.0 roadmap.

---

## Technical Deep-Dive

*Answers to the 5 most common architecture questions, verified against the codebase.*

### Tag-Aware Routing

RunnerIQ uses GitLab runner tags as a **hard gate** before any scoring begins. If a job requires tags that a runner doesn't have, that runner is excluded entirely — no exceptions.

After tag filtering, tag match quality contributes 20% to the overall runner score (configurable).

**How it works:**
1. Job requires tags `[docker, gpu]`
2. Runner A has tags `[docker, gpu, linux]` → ✅ passes gate (superset match)
3. Runner B has tags `[docker, shell]` → ❌ excluded (missing `gpu`)
4. Remaining runners scored by: utilization (40%), tag match (20%), capacity (20%), history (20%)

**Configuration** (`runneriq.example.yaml`):
```yaml
runneriq:
  assigner:
    scoring:
      weights:
        utilization: 0.40
        tag_match: 0.20
        capacity: 0.20
        history: 0.20
```

**Code:** `src/agent3_assigner/runner_scorer.py` — `RunnerScorerV2.score_runners()` implements the tag gate (`required_tags.issubset(runner_tags)`) and weighted scoring.

### Scope

| Component | Scope | API Endpoint |
|-----------|-------|-------------|
| Runner discovery (Module 1) | **Instance-level** — sees all runners visible to your API token | `GET /runners` |
| Pipeline analysis (Module 2) | **Project-level** — analyzes pipelines for one project | `GET /projects/{id}/pipelines` |
| Job assignment (Module 3) | **Project-level** — routes jobs within the configured project | Project-scoped |

**Current limitation:** RunnerIQ requires `GITLAB_PROJECT_ID` and analyzes one project at a time. For multi-project setups, run one RunnerIQ instance per project.

**v2.0 roadmap:** Group-level pipeline support (`GITLAB_GROUP_ID`) to analyze all projects in a group with a single instance.

### Integration Architecture

RunnerIQ runs as a **standalone sidecar process** alongside your GitLab instance. It does not modify your `.gitlab-ci.yml` or intercept GitLab's native scheduler.

```
┌──────────────┐     polls      ┌──────────────┐
│  RunnerIQ    │ ──────────────→│  GitLab API  │
│  (sidecar)   │     every 30s  │              │
└──────┬───────┘                └──────────────┘
       │
       ▼
  Advisory recommendations
  (logged for human review)
```

**How to run:**
```bash
# Full pipeline (Monitor → Analyze → Assign)
make run-pipeline

# Or individual modules
make run-monitor
make run-analyzer
make run-assigner
```

RunnerIQ is **non-blocking by design**. If RunnerIQ is down, removed, or misconfigured, your CI/CD runs exactly as it does today. Zero impact.

**v2.0 roadmap:** Webhook-based event-driven integration (`Pipeline` and `Job` hooks via Flask/FastAPI) for sub-second response times, with polling retained as a consistency fallback.

### Supported Runner Types

RunnerIQ works with **all GitLab runner types** — Docker, Shell, Kubernetes, custom — because it communicates exclusively through the GitLab REST API. No host-level agent or special runner configuration required.

| Metric | Source | What it measures |
|--------|--------|------------------|
| Runner status | `GET /runners/{id}` | Online/offline/paused |
| Active jobs | `GET /runners/{id}/jobs?status=running` | Current workload |
| Utilization | Calculated: `active_jobs / max_jobs` | Logical capacity usage |

**Current limitation:** Utilization is job-based, not resource-based. RunnerIQ knows "3 of 4 job slots are full" but not "CPU is at 90%." This is sufficient for job routing decisions but doesn't capture hardware-level bottlenecks.

**v2.0 roadmap:** Host-level metrics via Prometheus integration for CPU, memory, and disk-aware scheduling.

---

### Versioning

RunnerIQ follows sprint-aligned versioning. Versions v1.x–v2.x were internal development
milestones during the hackathon build phase. v3.x–v5.x correspond to protected Git tags
from iterative CI/CD testing. v6.0.0 was the first public PyPI release, aligning with our
6-week development sprint. v6.1.0 is the submission release bundling `--live` mode,
simulation, and all final features.

See [CHANGELOG.md](CHANGELOG.md) for full release history.

---

## Installation

#### Prerequisites

- Python 3.10+
- (Optional) Anthropic API key for AI decisions

#### Install from PyPI (recommended)

```bash
pip install runneriq
runneriq doctor
runneriq run --mock --output json
```

That's it. `runneriq doctor` verifies your setup (5 checks: GitLab API, Anthropic, carbon, config, tests).

#### Try It on GitLab (Duo Agent Platform)

Mention the agent on any issue or MR:
> `@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon Diagnose the latest failing pipeline`

The agent posts a structured diagnosis report directly on the issue.

#### Verify Installation

```bash
runneriq doctor           # Health check (5 checks)
runneriq run --mock       # Run the demo (mock mode)
runneriq run              # Start monitoring (requires GITLAB_* env vars)
```

---

## Development

For contributors who want to work on RunnerIQ locally:

#### Option 1: pip install -e (editable mode)

```bash
git clone https://gitlab.com/gitlab-ai-hackathon/participants/11553323.git
cd asifdotpy

python -m venv .venv
source .venv/bin/activate

pip install -e .          # Install RunnerIQ + all dependencies
pip install -e ".[dev]"   # Also install dev tools (pytest, mypy, black)

cp .env.example .env      # Copy environment template
```

#### Option 2: Via Makefile

```bash
git clone https://gitlab.com/gitlab-ai-hackathon/participants/11553323.git
cd asifdotpy
make setup        # Creates venv, installs deps + dev tools, runs tests
make setup-quick  # Same but skips the test suite
```

#### Option 3: Manual with requirements.txt

```bash
git clone https://gitlab.com/gitlab-ai-hackathon/participants/11553323.git
cd asifdotpy

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt   # Install all dependencies
cp .env.example .env              # Copy environment template
```

#### Verify Development Setup

```bash
make demo                 # Run the demo (mock mode)
make test                 # Run all tests
make typecheck            # 100% strict compliance on src/
```

---

## Configuration

#### Environment Variables

The setup script auto-copies `.env.example` to `.env`. Edit it with your credentials:

```bash
$EDITOR .env
```

| Variable | Required | Description |
|----------|----------|-------------|
| `GITLAB_URL` | Yes | Your GitLab instance URL |
| `GITLAB_TOKEN` | Yes | Personal access token with `api` scope |
| `GITLAB_PROJECT_ID` | Yes | Numeric project ID |
| `ANTHROPIC_API_KEY` | For AI mode | Anthropic API key (Module 3 hybrid/claude_only) |
| `RUNNERIQ_LOG_LEVEL` | No | `DEBUG`, `INFO`, `WARNING`, `ERROR` (default: `INFO`) |
| `RUNNERIQ_POLL_INTERVAL` | No | Runner polling interval in seconds (default: `30`) |

#### YAML Priority Rules

RunnerIQ also uses a YAML config for priority rules. Copy and customize:

```bash
cp runneriq.example.yaml runneriq.yaml
```

#### Priority Rules (Module 2)

```yaml
runneriq:
  analyzer:
    priority:
      branch_weights:
        main: 100          # Production branches get highest priority
        "hotfix/*": 95     # Hotfixes are near-production
        develop: 75        # Development branch
        "feature/*": 50    # Feature branches
        default: 40        # Unknown branches

      user_role_weights:
        maintainer: 100
        developer: 75
        guest: 25

      job_type_weights:
        deploy: 100        # Deploys are highest priority
        build: 75
        test: 50
        lint: 40

      manual_trigger_bonus: 10
      non_production_cap: 75  # Feature branches capped at 75
```

#### Decision Engine (Module 3)

```yaml
runneriq:
  assigner:
    decision_engine:
      mode: hybrid              # hybrid | rules_only | claude_only
      margin_threshold: 0.15    # Use AI when top-2 within 15%
      daily_token_budget: 50000 # Max AI tokens/day (0 = unlimited)
    trust_model:
      mode: advisory            # advisory | supervised | autonomous
      supervised_threshold: HIGH
```

---

## Usage

#### Run Modules

```bash
make run-monitor      # Module 1: Runner Monitor
make run-analyzer     # Module 2: Job Analyzer
make run-assigner     # Module 3: Smart Assigner (mock mode)
make run-optimizer    # Module 4: Performance Optimizer (mock mode)
make run-pipeline     # Full integration pipeline
make demo             # Live demo script
```

Run `make help` to see all 24 available targets.

#### Orchestrator (Unified Pipeline)

```bash
# Full 4-module pipeline (requires GITLAB_* env vars)
runneriq run

# Preview without executing assignments
runneriq run --dry-run

# Monitor + Analyze only (skip assignment and optimization)
runneriq run --mode analyze-only

# Mock mode (no API credentials needed, great for demos)
runneriq run --mock

# JSON output for scripting
runneriq run --mock --output json

# Or via Makefile
make run-orchestrator
```

#### Individual Modules (Advanced)

```bash
source .venv/bin/activate
export PYTHONPATH=src

python -m agent1_monitor.main
python -m agent2_analyzer.main --pipeline-id 12345 --project-id 67890
python -m agent3_assigner.main --mock
python -m agent4_optimizer.main --mock --output-format markdown
```

#### Module 4 Report Output

```bash
make run-optimizer
```

Produces:

```markdown
# RunnerIQ Performance Report
**Week of:** Feb 12-19, 2026

## Summary
- Total jobs: 532
- Avg completion time: 4.9 minutes (↓ 2.1 min from last week)
- Critical job delay: 0.8 minutes
- Runner utilization: 55%

## Top Performers 🏆
1. **runner-a-large**: 139 jobs, 2.4min avg, 99% uptime
2. **runner-d-medium**: 195 jobs, 3.6min avg, 98% uptime

## Needs Attention ⚠️
- **runner-c-small**: 52 jobs, 9.8min avg (2x slower than average)
  - Recommendation: Upgrade to medium specs or retire

## Cost Analysis 💰
- Total compute cost: $127 (↓ $213 from manual management)
- Cost per job: $0.15 (↓ from $0.41)
- Projected monthly savings: $340
```

---

## API Reference

#### GitLab APIs Used

| Endpoint | Module | Purpose |
|----------|--------|---------|
| `GET /api/v4/runners` | 1 | List all runners |
| `GET /api/v4/runners/:id` | 1 | Runner details |
| `GET /api/v4/runners/:id/jobs` | 1, 4 | Runner job history |
| `GET /api/v4/projects/:id/pipelines/:pid` | 2 | Pipeline metadata |
| `GET /api/v4/projects/:id/pipelines/:pid/jobs` | 2 | Job list |
| `GET /api/v4/projects/:id/repository/branches` | 2 | Branch info |
| `POST /api/v4/projects/:id/issues` | 4 | Create report issues |

#### Anthropic Claude API

```python
# Used by Module 3 for toss-up decisions only
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1000,
    messages=[{"role": "user", "content": prompt}]
)
```

#### Key Python Classes

| Class | Module | Description |
|-------|--------|-----------|
| `RunnerMonitor` | `agent1_monitor` | Real-time runner state tracking |
| `JobAnalyzer` | `agent2_analyzer` | Priority scoring and urgency classification |
| `SmartAssigner` | `agent3_assigner` | Hybrid rules + AI assignment engine |
| `TrustModel` | `agent3_assigner` | Advisory/Supervised/Autonomous trust tiers |
| `RunnerScorerV2` | `agent3_assigner` | 4-factor runner scoring with margin calculation |
| `ClaudeClient` | `agent3_assigner` | Claude API with token budget enforcement |
| `PriorityQueue` | `agent3_assigner` | SLA-aware job queue with auto-escalation |
| `FlowController` | `orchestrator` | Unified 4-module pipeline orchestrator |
| `RunContext` | `orchestrator` | Shared state passed through module pipeline |
| `PerformanceOptimizer` | `agent4_optimizer` | Module 4 pipeline orchestrator |
| `PerformanceScorer` | `agent4_optimizer` | Composite 0-100 scoring with issue detection |
| `ReportGenerator` | `agent4_optimizer` | Weekly Markdown report renderer |
| `MetricsCollector` | `agent4_optimizer` | Runner metrics aggregation |
| `ElectricityMapsClient` | `carbon` | Carbon intensity API with triple fallback + cache |
| `CarbonMCPTools` | `carbon` | 4 MCP tools for AI carbon-aware routing |
| `CO2SavingsTracker` | `carbon` | File-persisted CO2 savings + green routing rate |

---

## Security

RunnerIQ takes security seriously. Full details in [SECURITY.md](SECURITY.md).

- **Credentials:** All secrets (`GITLAB_TOKEN`, `ANTHROPIC_API_KEY`) via environment variables only. No hardcoded tokens.
- **CI scanning:** SAST, Secret Detection, and Dependency Scanning templates in every pipeline
- **Bandit:** Python-specific security linting blocks merge on medium+ severity findings
- **Advisory mode by default:** RunnerIQ recommends but never acts without explicit `--execute` opt-in
- **Auto-revert:** All tag changes include automatic rollback on failure
- **Audit trail:** Every assignment decision logged with full context
- **Local scan:** `make security` runs Bandit locally before pushing

---

## Test Suite

RunnerIQ has **1,171+ tests** across all modules, enforced by CI with 60%+ coverage.

```bash
make test             # Run all tests
make test-cov         # Run with coverage report (fail under 60%)
make test-agent1      # Module 1 tests only
make test-agent2      # Module 2 tests only
make test-agent3      # Module 3 tests only
make test-agent4      # Module 4 tests only
make test-integration # Integration and e2e tests
```

#### Test Distribution

| Module | Test Files | Key Coverage |
|--------|-----------|---------------|
| **Module 1** | `test_gitlab_client.py`, `test_runner_monitor.py` | API caching, stale fallback, state change detection |
| **Module 2** | `test_priority.py`, `test_history.py`, `test_job_analyzer.py`, `test_gitlab_client.py` | Priority overrides, non-production cap, env variable handling |
| **Module 3** | `test_smart_assigner.py`, `test_smart_assigner_v2.py`, `test_hybrid_engine.py`, `test_claude_integration.py`, `test_priority_queue.py`, `test_trust_mode.py`, `test_integration_e2e.py` | Hybrid routing, token budget, trust tiers, anomaly detection, SLA escalation |
| **Module 4** | `test_performance_scorer.py`, `test_metrics_collector.py`, `test_models.py` | Composite scoring, all 4 issue detectors, fleet average, edge cases |
| **Integration** | `test_full_pipeline.py`, `test_pipeline_integration.py`, `test_performance.py` | End-to-end pipeline, Module 1→2→3 flow, performance benchmarks |
| **E2E / Config** | `test_agent2_e2e.py`, `test_config_validation.py`, `test_error_handling.py`, `test_priority_scoring.py`, `test_integration.py` | Schema validation, error recovery, priority algorithm, config edge cases |
| **Module Tools** | `test_agent1_tools.py`, `test_agent2_tools.py`, `test_agent3_tools.py`, `test_agent3_user_search.py`, `test_shared_tools.py` | All 10 v2.1 tools: diagnostics, MR context, issue management, user search |
| **Alerting** | `test_flaky_detector.py`, `test_alert_grouper.py`, `test_suppression_engine.py`, `test_notification_router.py`, `test_config_schema.py` | 4-stage noise reduction: flaky detection, grouping, suppression, routing |
| **Action Bridge E2E** | `test_action_bridge_e2e.py` | Full advisory→action flow: tag add → verify → revert → Module 4 reports |
| **Smoke** | `test_smoke.py` | All module imports verified (alerting, core, carbon, orchestrator) |

#### CI Pipeline

| Stage | Jobs | Blocking? |
|-------|------|-----------|
| **Lint** | `black:format`, `flake8:lint`, `mypy:typecheck`, `pylint:analysis` | Yes (except pylint) |
| **Test** | `test:unit`, `test:integration`, `test:count-check` | Yes |
| **Carbon Tests** | `test_carbon_client`, `test_carbon_mcp_tools`, `test_carbon_routing_integration` (32 tests) | Yes |
| **Coverage** | `coverage:report` (≥ 60%) | Yes |
| **Security** | `security:bandit`, GitLab SAST, Secret Detection, Dependency Scanning | Yes (bandit); Advisory (safety) |

---

## Project Structure

```
flows/                            # GitLab Duo Agent Platform flows
├── runneriq.yml              # Intelligent Orchestration (public, WORKING — primary entry point)
├── diagnosis.yml             # Pipeline Failure Diagnosis (public, gRPC auth error — see #189)
├── README.md                 # Flow architecture docs
├── test-01-*.yml             # Test flows (private)
├── test-02-*.yml
├── test-03-*.yml
└── test-04-*.yml
src/
├── agent1_monitor/           # 🔍 Runner Monitor
│   ├── gitlab_client.py      #    GitLab API client with caching
│   ├── runner_monitor.py     #    State tracking & change detection
│   ├── main.py               #    CLI entry point
│   └── tests/
├── agent2_analyzer/          # 📊 Job Analyzer
│   ├── job_analyzer.py       #    Pipeline analysis orchestrator
│   ├── priority.py           #    Priority scoring engine
│   ├── history.py            #    Historical duration estimation
│   ├── priority_config.yaml  #    Configurable rules
│   └── tests/
├── agent3_assigner/          # 🧠 Smart Assigner
│   ├── smart_assigner.py     #    Main orchestrator (3-path routing)
│   ├── runner_scorer.py      #    4-factor runner scoring
│   ├── claude_client.py      #    Claude API + token budget
│   ├── trust_model.py        #    Advisory/Supervised/Autonomous
│   ├── hybrid_engine.py      #    Hybrid decision engine
│   ├── priority_queue.py     #    SLA-aware priority queue
│   └── tests/
├── agent4_optimizer/         # 📈 Performance Optimizer
│   ├── optimizer.py          #    Full pipeline orchestrator
│   ├── performance_scorer.py #    Composite scoring + issue detection
│   ├── metrics_collector.py  #    Runner metrics aggregation
│   ├── report_generator.py   #    Weekly Markdown reports
│   ├── models.py             #    Data models (RunnerMetrics, etc.)
│   └── tests/
├── orchestrator/             # 🎯 Unified Pipeline Orchestrator
│   ├── cli.py                #    CLI entry point (runneriq run)
│   ├── flow_controller.py    #    4-module pipeline with graceful degradation
│   ├── run_context.py        #    Shared state dataclass
│   └── tests/
├── carbon/                   # 🌍 Carbon-Aware Routing
│   ├── models.py             #    6 dataclasses (CarbonIntensity, etc.)
│   ├── electricity_maps_client.py  # API client + cache + triple fallback
│   ├── mcp_server.py         #    4 MCP tools for AI
│   ├── co2_tracker.py        #    CO2 savings tracker (file-persisted)
│   ├── settings.py           #    Carbon env vars + demo config
│   ├── dashboard.py          #    Flask API (4 endpoints)
│   └── dashboard.html        #    Single-file HTML dashboard
├── alerting/                 # 🔇 Noise Reduction Alerting
│   ├── flaky_detector.py     #    Fail→pass retry pattern detection
│   ├── alert_grouper.py      #    Time-window alert batching
│   ├── suppression_engine.py #    Rule-based alert filtering
│   ├── notification_router.py #   Severity-based channel routing
│   ├── models.py             #    Alert, AlertGroup, SuppressionResult, etc.
│   ├── config_schema.py      #    YAML config validation
│   └── tests/
├── common/                   #    Shared utilities
│   ├── logging_config.py     #    Structured JSON logging
│   ├── config_validator.py   #    YAML config validation
│   └── benchmarks.py         #    Performance measurement
├── config/                   #    Centralized configuration
│   └── runneriq_config.py    #    Config loader + validation
└── integration/              #    Cross-agent integration
    ├── full_pipeline.py      #    End-to-end pipeline runner
    └── tests/
```

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for full details. Quick summary:

1. **Branch** from `main`: `git checkout -b feat/your-feature`
2. **Format**: `make format`
3. **Lint**: `make lint`
4. **Type check**: `make typecheck`
5. **Test**: `make test`
6. **All checks**: `make check` (runs lint + typecheck + test)
7. **Commit**: Use [conventional commits](https://www.conventionalcommits.org/) (`feat:`, `fix:`, `docs:`, `test:`)
8. **Clean up**: `make clean`
9. **MR**: All CI checks must pass before merge

---

## Tech Stack

| Component | Technology |
|-----------|------------|
| Platform | GitLab Duo Agent Platform |
| AI Model | Anthropic Claude (Sonnet) |
| Language | Python 3.10+ |
| Package Manager | [uv](https://docs.astral.sh/uv/) (fast, Rust-based) |
| APIs | GitLab REST API v4 |
| Config | YAML (priority rules, thresholds) |
| Testing | pytest + pytest-cov |
| CI/CD | GitLab CI (lint → test → coverage → security) |
| Logging | Unified `setup_logging()` with `RotatingFileHandler`, JSON structured output |

## Graceful Degradation

RunnerIQ degrades gracefully through multiple layers. Because it is advisory and non-blocking, there is no single point of failure.

| Layer | Trigger | Behavior | Latency Impact |
|-------|---------|----------|---------------|
| **Full system** | Everything working | Rules + AI hybrid scoring | ~3ms (rules) / ~2-3s (AI) |
| **Rules-only** | AI API unavailable or token budget exhausted | Deterministic rules scoring, zero AI API calls | ~3ms |
| **Stale cache** | GitLab API errors | Last-known runner state used for scoring | ~3ms |
| **Passthrough** | RunnerIQ down or crashed | GitLab native FIFO scheduler continues unaffected | 0ms (RunnerIQ not in path) |

**Key design principle:** RunnerIQ is advisory. It recommends assignments but never blocks or intercepts GitLab's scheduler. There is no fallback mechanism that could itself fail, because RunnerIQ is never in the critical path.

**Code references:**
- AI → rules fallback: `src/agent3_assigner/smart_assigner.py` → `_decide_single_job()`
- Stale cache on API errors: `src/agent1_monitor/gitlab_client.py` → `get_stale()`
- Module 3 failure handling: `src/integration/full_pipeline.py` → `_run_agent3()`

## Performance Targets

| Metric | Target |
|--------|--------|
| Runner polling | Every 30 seconds |
| Job analysis | < 5 seconds |
| Rules-based assignment | < 100ms |
| AI decision | < 3 seconds |
| Total assignment latency | < 20 seconds |

---

## Community Validation

- **1,008+ comments** across 3 GitLab issues ([10 years unsolved](https://gitlab.com/gitlab-org/gitlab/-/issues/14976))
- **11 DevOps engineers** validated on r/devops across 2 posts:
  - Engineering Manager, EKS fleet operator, first user-side pain confirmation
  - eltear1: 5 technical deep-dive questions on tag routing, scope, integration, runner types, autoscaling (drove 5 new README sections)
  - ArieHein: vision alignment on agents replacing CI/CD DSLs, MCP as task execution layer
- **0 competitors** in the intelligent runner orchestration space

#### Upstream Contribution

During development, debugging custom flow YAML configurations uncovered several undocumented runtime behaviours — including silent WebSocket closures caused by the `inputs` string format passing schema validation but failing at runtime. These findings were shared with the GitLab team and resulted in [gitlab-org/gitlab#591567](https://gitlab.com/gitlab-org/gitlab/-/issues/591567), where the AI Catalog team is now actively working on improved error messaging for misconfigured flows.

---

## ⚠️ Known Platform Limitations

RunnerIQ is built on the GitLab Duo Agent Platform, which is new and evolving. We document these limitations transparently, not as complaints, but to help users understand current boundaries and future possibilities.

#### API Integration Points (Future Enablement)

RunnerIQ's Smart Assigner currently operates in **Hybrid Mode**, accepting runner data via context/JSON. When the Duo Agent Platform expands to include these endpoints, it switches to **Live Mode** with zero code changes:

| Endpoint | Purpose in RunnerIQ | Priority | Status |
|----------|---------------------|----------|--------|
| `GET /projects/{id}/runners` | Discover available runners in the fleet | 🔴 Critical | Requires Maintainer role |
| `GET /runners/{id}` | Runner details: tags, status, capacity, region | 🔴 Critical | Requires Maintainer role |
| `GET /runners/{id}/jobs` | Current workload per runner (for balancing) | 🟡 High | Requires Maintainer role |
| `GET /projects/{id}/jobs?scope[]=pending` | Pending job queue (what needs assignment) | 🔴 Critical | Available |
| `GET /projects/{id}/jobs/{job_id}` | Job requirements: tags, resource needs | 🟡 High | Available |
| `POST /projects/{id}/jobs/{job_id}/play` | Execute the assignment decision | 🟢 Nice-to-have | Advisory mode works without this |

#### Hybrid Mode vs. Live Mode

```
TODAY (Hybrid Mode):
  User provides runner JSON in context → Smart Assigner reasons over it → Recommendation

FUTURE (Live Mode):
  Smart Assigner calls Runner API → Gets live fleet state → Reasons over it → Assignment
```

The **decision logic is identical**. Only the data source changes. This is by design.

---

## 🔄 Platform Constraints & Our Pivot

RunnerIQ's Smart Assigner was designed to call Runner API endpoints (`GET /projects/:id/runners`, `GET /runners/:id`, `GET /runners/:id/jobs`) for live fleet data. During development, we discovered these endpoints require **Maintainer-level access**, which is beyond the Developer role available to hackathon participants.

#### What we did

1. **Built the full scoring engine** — tag match (40%), capacity (30%), performance (30%), with carbon-aware tiebreaking via AI
2. **Pivoted to a hybrid model** — the Smart Assigner accepts runner fleet data via context/JSON instead of calling live APIs
3. **Kept the decision logic identical** — only the data source changed, not the intelligence
4. **Designed for zero-change upgrade** — when the Duo Agent Platform expands Runner API access to Developer role, RunnerIQ switches to live fleet management with no code changes

#### Why this matters

GitLab's runner queue uses FIFO (first-in, first-out) scheduling — a [10-year-old problem](https://gitlab.com/gitlab-org/gitlab/-/issues/14976) with 1,008+ comments. RunnerIQ's scoring engine solves this by matching the right job to the right runner instantly. 85% of decisions are handled by the rules engine (free, <100ms). AI handles the 15% that need genuine reasoning — failure triage, anomaly explanation, and carbon-aware trade-offs when runners score within a 15% margin.

The triage brain is built and tested. The integration point is ready. The platform will catch up.

#### Validated by GitLab team

> _"it's good you found a way to still show the value by using simulated data!"_
> — **Mattias Michaux**, GitLab ([source](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/issues/1#note_3128098028))

#### Bonus: Our debugging contributed back to GitLab

During this pivot, our flow debugging findings were adopted by GitLab as an official issue: [gitlab-org/gitlab#591567](https://gitlab.com/gitlab-org/gitlab/-/issues/591567). See [#123](https://gitlab.com/gitlab-ai-hackathon/participants/11553323/-/issues/123) for details.

---

## License

[MIT License](LICENSE) — Copyright (c) 2026 Md Asif Iqbal

---

<div align="center">

**Built for the [GitLab AI Hackathon 2026](https://about.gitlab.com/)**

*RunnerIQ: Less noise. More signal. Zero alert fatigue.*
*And your CI/CD should be greener while it's at it.*

</div>
