connecting…
Sign out
RAM Used
— GB free
Running Models
none loaded
Requests Today
last 24 h
TTFT Median
time-to-first-token
Avg Tok / s
generation speed
KV Cache Saved
vs 4096 default
Sessions
Grouped by 5-min inactivity gaps
Loading…
Request Feed
Model TTFT Total Tokens Profile Time
Loading…
What autotune did & why
Real decisions per request — KV context, quantization, RAM relief, QOS. Click any row to expand.
Optimizations (24h)
KV context hits
Avg KV saved
Prompt cache pins
Fast/quality profiles
Avg TTFT (24h)
Loading…
Requests per Hour
Last 24 hours
TTFT Over Time
Last 100 requests · green <400 ms · blue <1200 ms · yellow <2500 ms · red ≥2500 ms
RAM Usage Over Time
Peak RAM per request (GB) · high values indicate memory pressure
KV Cache Size Over Time
Estimated KV cache per request (GB) · lower = autotune optimization working
Token Throughput (tok/s)
Generation speed · lower values when multiple models share memory
What These Charts Show
RAM Usage — peak memory consumed per request. Spikes above your system's threshold trigger autotune optimizations.
KV Cache — estimated memory used by the attention cache. autotune's dynamic sizing keeps this proportional to actual prompt length.
Throughput — generation speed in tokens/sec. Drops when multiple models are loaded simultaneously or RAM pressure is high. Data points are highlighted when >1 model was loaded concurrently.
KV Cache: Ollama Default (4096 tokens) vs autotune Dynamic Sizing
Ollama Default
4,096
tokens · fixed
autotune Average
tokens · dynamic
Context Reduction
KV Memory Saved
proportional
Avg TTFT (measured)
all-time average
All Models
Model Requests Avg TTFT P95 TTFT Avg Tok/s Avg Context Avg Elapsed Total Tokens Last Used
Loading…
Requests Over 5 s
High-latency calls — investigate model size, RAM pressure, or context length
Model Elapsed TTFT Context Profile Time
Loading…
Installed Models
All Ollama models available locally
Model Size Parameters Quantization Family Installed
Loading…
Recommended Models
Best LLMs at every size — auto-updated
Tier:
Analyzing your hardware…
Active API Keys
Name Key Prefix Req Today Tokens Today Last Used Created
Loading…
All Conversations
Every session captured by the gateway
Loading conversation history…
Loading…
Data & Privacy
Local Storage
Write performance data and telemetry events to the local SQLite database. Disabling this stops run_observations, telemetry_events, and hardware_profiles from being written. Model catalog data is always stored regardless.
Remote Telemetry
Send anonymised performance metrics to the autotune Supabase project. Data includes: model ID, hardware fingerprint, throughput, TTFT, and context size. No conversation content is ever sent.
Data Retention
Days to keep run_observations rows before pruning. Set to 0 to keep forever. Use "Clean Up Now" to immediately remove rows beyond this limit.
days
Gateway
Default QOS Profile
Profile applied when no X-Autotune-Profile header is present. fast = low latency; balanced = general use; quality = best output.
Ollama URL Override
Custom Ollama base URL. Leave blank to use the AUTOTUNE_OLLAMA_URL env var or the default localhost:11434. Takes effect on next server restart.
Model Catalog Refresh Interval
How often (in hours) the model catalog is automatically refreshed from HuggingFace in the background.
hours
Security — Environment Variables read-only
These settings are configured via environment variables or the .env file. Restart the server after changing them.
Dashboard Admin Key
AUTOTUNE_ADMIN_KEY — required to log into the dashboard.
API Key Enforcement
AUTOTUNE_REQUIRE_API_KEY — when on, all /v1/* inference requests require a valid API key.
Request Body Limit
AUTOTUNE_MAX_BODY_BYTES — oversized requests are rejected with HTTP 413.
CORS Extra Origins
AUTOTUNE_CORS_ORIGINS — comma-separated extra allowed origins. Blank = localhost only.
Supabase Remote Telemetry URL
AUTOTUNE_SUPABASE_URL — custom Supabase project for telemetry. Leave unset to use the built-in autotune project.
Database
Run observations
Model catalog entries
Hardware profiles
Telemetry events
Security events
DB size
Checking gateway…
Security Score
/
Loading security posture…
Always-on Protections
Login Rate Limiting Body Size Limit Session Revocation Audit Log CORS Locked Server Header Removed Cache-Control: no-store Dashboard Rate Limiting Permissions-Policy DB File: 600
Loading…
Login Failures
brute-force attempts
Invalid Keys
rejected /v1/* calls
Rate Limit Hits
IP lockouts triggered
Successful Logins
dashboard sessions
Keys Created
new API keys issued
Keys Revoked
invalidated
Audit Trail
Persisted across restarts · auto-refreshes every 30 s
Loading security events…