Active Profile

Determines which per-mode dashboard panel is shown and which cost-attribution bucket requests land in.

Current (env) {{ active_profile }}

Vault Injection

Controls whether TokenPak injects relevant vault context blocks into each request, and the retrieval parameters.

Enable vault injection
Injects up to top_k matching blocks into the system prompt

Budget Enforcement

Enforces a monthly token budget. Requests that would exceed the limit are rejected with a 429 response.

Enable budget controller
Hard-stops requests once monthly limit is reached
Controller status {% if budget_controller_enabled %} On {% else %} Off {% endif %}

Cache Invalidation Alerts

Fire a webhook or Slack notification when the cache hit rate drops below the threshold.

Enable webhook alerts

Provider Routing

The failover chain is configured in ~/.tokenpak/config.yaml under failover.chain. Edit that file directly to change provider order, model maps, or credential env vars. View current usage →

Local-First Routing

Routes requests to a local Ollama instance first; falls back to cloud providers on error.

⚠ Safety warning Enabling local-first routing sends all requests to {{ ollama_upstream }} by default. If Ollama is not running, requests fall back to the cloud provider chain — but latency will spike. Only enable this if you have Ollama running locally.
Enable local-first routing
Ollama at {{ ollama_upstream }} restart required

Compliance Routing

Pin all requests to a specific provider endpoint for compliance or data-residency requirements.

{% if compliance_provider %}
Current compliance provider {{ compliance_provider }}
{% endif %}