Production hardening

Everything in this page is opt-in: the framework’s defaults already work for single-tenant scripts and demos. The settings below are what you flip on when you’re putting the agent in front of real users on real infrastructure.

The whole page is organised around a single theme — multi-tenancy without footguns. One shared Agent (and one Memory, one Budget, one AuditLog) backing N users requires more than just passing user_id= everywhere; it requires bounded state, per-user caps, scoped permissions, observable extraction, and pluggable secret resolution. Each section below maps a production concern to the framework primitive that closes it.

Per-user budget caps

StandardBudget tracks tokens / cost / wall clock both globally and per user_id. The global caps catch runaway aggregate usage; the per-user caps stop one tenant from exhausting another’s quota.

from datetime import timedelta
from jeevesagent import Agent
from jeevesagent.governance.budget import BudgetConfig, StandardBudget

agent = Agent(
    "...",
    budget=StandardBudget(BudgetConfig(
        # Global: applies to all users combined.
        max_tokens=10_000_000,
        max_cost_usd=100.0,
        max_wall_clock=timedelta(hours=1),
        # Per-user: applies to each user_id's bucket independently.
        per_user_max_tokens=100_000,
        per_user_max_cost_usd=2.0,
        per_user_max_wall_clock=timedelta(minutes=10),
        soft_warning_at=0.8,  # warn at 80% of either cap
    )),
)

A run is blocked when either its user’s cap or the global cap is exceeded — whichever fires first. result.interrupted = True and result.interruption_reason = "budget:per_user_max_tokens" (or the matching global field) on a blocked run.

Inspect a single user’s running totals at any time:

usage = agent._budget.usage_for("alice")
# {"tokens_in": 1240, "tokens_out": 880, "tokens_total": 2120,
#  "cost_usd": 0.0042}

Both the agent loop and direct callers pass user_id through to the budget automatically — you don’t need to thread it manually.

Per-user permission policies

PerUserPermissions routes the policy decision per user_id. One Agent can run in BYPASS mode for staff while still gating destructive tools for end users.

from jeevesagent import Agent, Mode, PerUserPermissions, StandardPermissions

policies = {
    "admin_alice": StandardPermissions(mode=Mode.BYPASS),
    "service_account": StandardPermissions(
        mode=Mode.DEFAULT,
        allowed_tools=["read", "search"],
    ),
}

perms = PerUserPermissions(
    policies=policies,
    default=StandardPermissions(
        mode=Mode.DEFAULT,
        denied_tools=["delete_account", "send_email"],
    ),
)

agent = Agent("...", permissions=perms)

Unknown user_ids fall through to default. The framework forwards the live user_id from the active RunContext into every permissions.check(...) call — you don’t need to wire it manually.

Approval handler for Decision.ask_

StandardPermissions(mode=Mode.DEFAULT) returns Decision.ask_(...) when a tool is marked destructive=True. Without an approval handler, ask falls back to deny — the agent never silently bypasses the gate. Wire a handler to surface the decision to a human / Slack / ticket queue:

from jeevesagent import Agent
from jeevesagent.core.types import ToolCall

async def approve_via_slack(call: ToolCall, user_id: str | None) -> bool:
    """Return True to allow the tool call, False to deny."""
    msg_id = await slack.post(
        channel="#approvals",
        text=f"User {user_id} wants to run {call.tool}({call.args!r}). React 👍 to approve.",
    )
    reactions = await slack.wait_for_reaction(msg_id, timeout=300)
    return "👍" in reactions

agent = Agent(
    "...",
    permissions=StandardPermissions(mode=Mode.DEFAULT),
    approval_handler=approve_via_slack,
    tools=[delete_user_data],  # @tool(destructive=True)
)

Failure-mode contract:

  • Handler returns False → tool result is denied with reason="approval declined"; the run continues, the model sees the denial in the next turn.

  • No handler wired (approval_handler=None)denied with reason="approval required; no approver". Same UX as before M10.4 — single-tenant code that didn’t want approvals keeps working.

  • Handler raises → treated as deny + warning logged. A buggy approval flow must NOT silently green-light a gated tool.

Bounded per-user state

StandardBudget._by_user and InMemoryMemory._blocks hold per-user_id state in process. Without bounds, a runaway tenant or one-shot user_id explosion (e.g. someone sending one request per random UUID) grows the dict until the process OOMs. Both primitives now default to bounded state with LRU + idle-TTL eviction:

from jeevesagent import InMemoryMemory
from jeevesagent.governance.budget import BudgetConfig, StandardBudget

# Defaults: 100k users, 24h idle TTL.
budget = StandardBudget(BudgetConfig())  # implicit bounds
memory = InMemoryMemory()                # implicit bounds

# Tune for your workload:
budget = StandardBudget(
    BudgetConfig(),
    max_users=10_000,                # smaller cap for known tenant size
    user_idle_ttl_seconds=3_600,     # drop idle users after 1h
)
memory = InMemoryMemory(
    max_users=10_000,
    user_idle_ttl_seconds=3_600,
)

# Or disable bounding entirely (single-tenant or fixed N tenants):
budget = StandardBudget(BudgetConfig(), max_users=None, user_idle_ttl_seconds=None)

What eviction does:

  • LRU — when max_users is exceeded, the least-recently-touched user’s bucket is dropped. For StandardBudget that means the user’s running totals reset; for InMemoryMemory their working blocks are deleted.

  • TTL — a user idle longer than user_idle_ttl_seconds has their bucket dropped on the next access. Lazy: no background thread; the sweep runs on each touch.

Eviction is destructive in process — the bucket’s data is gone, not flushed elsewhere. Callers needing durable spill-to-disk should use SqliteMemory or PostgresMemory (which persist working blocks across restarts) instead of relying on the in-memory bound.

Auto-extract observability

AutoExtractMemory runs a small LLM extraction pass after every agent.run() to pull structured (subject, predicate, object) facts into the bi-temporal store. It’s on by default for real network adapters, which means it’s also on your bill — you should know when it fires and how long it takes.

Two telemetry signals are emitted when Agent(telemetry=...) is wired:

Metric

Type

Tags

Use

jeeves.auto_extract.duration_ms

histogram

user_id, status (ok/error)

Latency budget per tenant

jeeves.auto_extract.invocations

counter

user_id, status

Failure rate; cost attribution

A one-time-per-process INFO log notice tells you when the default-on heuristic fires:

INFO  jeevesagent.memory.auto_extract: AutoExtractMemory enabled
by default for this model class. Each remembered episode triggers
a small extraction call to pull (subject, predicate, object)
facts. Pass Agent(auto_extract=False) to disable, or
Agent(auto_extract=True) to silence this notice.

To disable for cost reasons, pass auto_extract=False:

agent = Agent("...", model="gpt-4o", auto_extract=False)

To enable explicitly (and silence the startup notice):

agent = Agent("...", model="gpt-4o", auto_extract=True)

Secrets resolution

Production agents shouldn’t hard-code API keys; they shouldn’t depend on os.environ either (process-level env vars leak across threads, get logged in crashes, and don’t rotate). The Secrets protocol gives you a pluggable resolver:

from jeevesagent import Agent, EnvSecrets, DictSecrets

# Default — reads from os.environ. Same behaviour as pre-M10.
agent = Agent("...", model="claude-opus-4-7", secrets=EnvSecrets())

# In-memory for tests or vault-fetched-once-at-startup:
agent = Agent(
    "...",
    model="claude-opus-4-7",
    secrets=DictSecrets({
        "ANTHROPIC_API_KEY": api_key_from_vault,
        "OPENAI_API_KEY": openai_key_from_vault,
    }),
)

Resolution order inside model adapters:

  1. Explicit api_key= argument on the model adapter

  2. secrets.lookup_sync(<ENV_VAR_NAME>) if a Secrets backend is wired

  3. os.environ[<ENV_VAR_NAME>] as the bare fallback

Production callers running on AWS / GCP / Vault should write a small custom adapter:

class VaultSecrets:
    """Pulls from HashiCorp Vault, caches into a local dict for
    the constructor-time ``lookup_sync`` path."""

    def __init__(self, vault_client, path: str) -> None:
        self._client = vault_client
        self._cache: dict[str, str] = {}

    async def resolve(self, ref: str) -> str:
        if ref in self._cache:
            return self._cache[ref]
        secret = await self._client.read(f"secret/data/{ref}")
        value = secret["data"]["value"]
        self._cache[ref] = value
        return value

    async def store(self, ref: str, value: str) -> None:
        await self._client.write(f"secret/data/{ref}", value=value)
        self._cache[ref] = value

    def redact(self, text: str) -> str:
        # Use the default regex set or extend with your own:
        from jeevesagent.security.secrets import _apply_redaction
        return _apply_redaction(text)

    def lookup_sync(self, ref: str) -> str | None:
        # Constructor-time path — return whatever's already in the
        # cache; if nothing's there, return None and let the caller
        # fall back to env-vars.
        return self._cache.get(ref)


# Pre-warm the cache at startup so lookup_sync hits during Agent()
# construction:
async def boot():
    vault = await connect_vault(...)
    secrets = VaultSecrets(vault, path="prod/jeeves")
    await secrets.resolve("ANTHROPIC_API_KEY")
    await secrets.resolve("OPENAI_API_KEY")

    agent = Agent("...", model="claude-opus-4-7", secrets=secrets)

secrets.redact(text) masks common API-key shapes (OpenAI, Anthropic, AWS access keys, GitHub PATs) — useful inside @agent.before_tool hooks that log tool args, or before payload strings hit the audit log.

Audit log with per-user attribution

AuditEntry carries a top-level user_id field. Every run_started / tool_call / tool_result / run_completed entry the framework writes is attributed to the active user:

from jeevesagent import Agent, FileAuditLog

agent = Agent(
    "...",
    audit_log=FileAuditLog("./audit.jsonl", secret="prod-secret"),
)

await agent.run("export my data", user_id="alice", session_id="s1")

# Compliance query — filter the log by user:
alice_entries = await agent._audit_log.query(user_id="alice")
# Or by session:
session_entries = await agent._audit_log.query(session_id="s1")
# Or both:
both = await agent._audit_log.query(user_id="alice", action="run_completed")

The HMAC signature on each entry includes user_id, so tampering that swaps a different user’s id into an entry breaks the signature. verify_signature(entries, secret) validates the chain on read.

Verifying isolation under load

bench/multi_tenant.py simulates N concurrent users × M turns through one shared Agent and reports p50 / p99 latency, RSS growth, isolation violations (cross-user data leakage), and budget accounting mismatches:

python bench/multi_tenant.py --users 100 --turns 3
python bench/multi_tenant.py --users 500 --turns 5  # stress

Output for the 500 × 5 stress run on a developer laptop:

============================================================
  Multi-tenant load bench
============================================================
  users          : 500
  turns / user   : 5
  total runs     : 2500

  p50 turn latency : 1008.04 ms
  p99 turn latency : 1057.01 ms
  RSS growth         : 10179 KB
  per-user growth    : 20.36 KB/user

  isolation violations : 0
  budget mismatches    : 0

  PASS: isolation + budget accounting hold under load

A smoke-test variant runs as part of the regular pytest suite (tests/test_multi_tenant_load.py) at lower scale (10 users × 2 turns) so a regression to the isolation contract gets caught in CI.

Migrating from 0.9.x to 0.10

The 0.10 release is additive — every change is opt-in. Three things to be aware of when you upgrade:

  1. Postgres memory_blocks schema migrated. The empty-string anonymous bucket is replaced with a reserved sentinel (__jeeves_anon_user__). The framework’s init_schema() runs the rewriting UPDATE automatically; if you bypass it and manage migrations yourself, run:

    UPDATE memory_blocks SET user_id = '__jeeves_anon_user__'
      WHERE user_id = '';
    ALTER TABLE memory_blocks ALTER COLUMN user_id
      SET DEFAULT '__jeeves_anon_user__';
    

    Attempting to use the sentinel as a real user_id now raises ValueError — defense against impersonating the anonymous bucket.

  2. Custom Memory / Permissions / HookHost / AuditLog impls without a user_id= kwarg emit JeevesDeprecationWarning. The shim layer still calls the legacy shape, so nothing breaks today. The deprecation will turn into a hard removal in 1.0. To silence:

    class MyMemory:
        async def remember(
            self, episode: Episode, *, user_id: str | None = None,
        ) -> str:
            ...
    
  3. StandardBudget._by_user and InMemoryMemory._blocks are now bounded by default. A workflow that legitimately pins >100k active users in process needs to opt out or raise the cap:

    StandardBudget(BudgetConfig(), max_users=None, user_idle_ttl_seconds=None)
    InMemoryMemory(max_users=None, user_idle_ttl_seconds=None)
    

That’s the whole upgrade. No public API got renamed; no constructor arg got removed.

What you turn on, listed

For a checklist-driven shipment, these are the production flips:

Concern

Opt-in

Per-user quota

BudgetConfig(per_user_max_*)

Tenant-specific permissions

PerUserPermissions(policies=, default=)

Human-in-the-loop for destructive tools

Agent(approval_handler=...)

Bounded in-process state

Defaults active; tune max_users / user_idle_ttl_seconds

Vault-backed API keys

Agent(secrets=VaultSecrets(...))

Auto-extract metrics

Agent(telemetry=OTelTelemetry(...))

Per-user audit

FileAuditLog(...) (attribution is automatic)

Load-test isolation

bench/multi_tenant.py

Pair this with the Production checklist in recipes for the broader operational concerns (durable runtime, persistent memory, sandbox, etc.).