System Overview
RAM Used
—
— GB free
Running Models
—
none loaded
Requests Today
—
last 24 h
TTFT Median
—
time-to-first-token
Avg Tok / s
—
generation speed
KV Cache Saved
—
vs 4096 default
Live Activity
Sessions
Grouped by 5-min inactivity gaps
—
Loading…
Request Feed
—
| Model | TTFT | Total | Tokens | Profile | Time |
|---|---|---|---|---|---|
| Loading… | |||||
Optimizations
What autotune did & why
Real decisions per request — KV context, quantization, RAM relief, QOS. Click any row to expand.
—
—
Optimizations (24h)
—
KV context hits
—
Avg KV saved
—
Prompt cache pins
—
Fast/quality profiles
—
Avg TTFT (24h)
Loading…
Performance Trends
Requests per Hour
Last 24 hours
TTFT Over Time
Last 100 requests · green <400 ms · blue <1200 ms · yellow <2500 ms · red ≥2500 ms
RAM Usage Over Time
Peak RAM per request (GB) · high values indicate memory pressure
KV Cache Size Over Time
Estimated KV cache per request (GB) · lower = autotune optimization working
Token Throughput (tok/s)
Generation speed · lower values when multiple models share memory
What These Charts Show
RAM Usage — peak memory consumed per request. Spikes above your system's threshold trigger autotune optimizations.
KV Cache — estimated memory used by the attention cache. autotune's dynamic sizing keeps this proportional to actual prompt length.
Throughput — generation speed in tokens/sec. Drops when multiple models are loaded simultaneously or RAM pressure is high. Data points are highlighted when >1 model was loaded concurrently.
Raw vs Tuned · Context Window Optimization
KV Cache: Ollama Default (4096 tokens) vs autotune Dynamic Sizing
Ollama Default
4,096
tokens · fixed
autotune Average
—
tokens · dynamic
Context Reduction
—
—
KV Memory Saved
—
proportional
Avg TTFT (measured)
—
all-time average
—
Per-Model Breakdown
All Models
—
| Model | Requests | Avg TTFT | P95 TTFT | Avg Tok/s | Avg Context | Avg Elapsed | Total Tokens | Last Used |
|---|---|---|---|---|---|---|---|---|
| Loading… | ||||||||
Slow Requests
Requests Over 5 s
High-latency calls — investigate model size, RAM pressure, or context length
—
| Model | Elapsed | TTFT | Context | Profile | Time |
|---|---|---|---|---|---|
| Loading… | |||||
Model Library
Installed Models
All Ollama models available locally
| Model | Size | Parameters | Quantization | Family | Installed |
|---|---|---|---|---|---|
| Loading… | |||||
Model Catalog
Recommended Models
Best LLMs at every size — auto-updated
Analyzing your hardware…
API Keys
Active API Keys
—
| Name | Key Prefix | Req Today | Tokens Today | Last Used | Created | |
|---|---|---|---|---|---|---|
| Loading… | ||||||
Conversation History
All Conversations
Every session captured by the gateway
—
Loading conversation history…
Suggestions
Loading…
Settings
Data & Privacy
Local Storage
Write performance data and telemetry events to the local SQLite database. Disabling this stops run_observations, telemetry_events, and hardware_profiles from being written. Model catalog data is always stored regardless.
—
Remote Telemetry
Send anonymised performance metrics to the autotune Supabase project. Data includes: model ID, hardware fingerprint, throughput, TTFT, and context size. No conversation content is ever sent.
—
Data Retention
Days to keep run_observations rows before pruning. Set to 0 to keep forever. Use "Clean Up Now" to immediately remove rows beyond this limit.
days
Gateway
Default QOS Profile
Profile applied when no X-Autotune-Profile header is present. fast = low latency; balanced = general use; quality = best output.
Ollama URL Override
Custom Ollama base URL. Leave blank to use the AUTOTUNE_OLLAMA_URL env var or the default localhost:11434. Takes effect on next server restart.
Model Catalog Refresh Interval
How often (in hours) the model catalog is automatically refreshed from HuggingFace in the background.
hours
Security — Environment Variables read-only
These settings are configured via environment variables or the
.env file. Restart the server after changing them.Dashboard Admin Key
AUTOTUNE_ADMIN_KEY — required to log into the dashboard.
—
API Key Enforcement
AUTOTUNE_REQUIRE_API_KEY — when on, all /v1/* inference requests require a valid API key.
—
Request Body Limit
AUTOTUNE_MAX_BODY_BYTES — oversized requests are rejected with HTTP 413.
—
CORS Extra Origins
AUTOTUNE_CORS_ORIGINS — comma-separated extra allowed origins. Blank = localhost only.
—
Supabase Remote Telemetry URL
AUTOTUNE_SUPABASE_URL — custom Supabase project for telemetry. Leave unset to use the built-in autotune project.
—
Database
—
Run observations
—
Model catalog entries
—
Hardware profiles
—
Telemetry events
—
Security events
—
DB size
Security Posture
Loading…
Activity — Last 24 h
Login Failures
—
brute-force attempts
Invalid Keys
—
rejected /v1/* calls
Rate Limit Hits
—
IP lockouts triggered
Successful Logins
—
dashboard sessions
Keys Created
—
new API keys issued
Keys Revoked
—
invalidated
Security Event Log
Audit Trail
Persisted across restarts · auto-refreshes every 30 s
Loading security events…