---
orphan: true
---

# ADR: Isolation Hardening (2026-05-13)

**Status:** Accepted.
**Context:** Agent symlink incident in `scitex-stats-auditor` proof of
concept; sac's positioning vs. Clew reproducibility verification.

## Problem

On 2026-05-13 the first per-package auditor (`scitex-stats-auditor`)
ran under Apptainer with `sac-base.sif + overlay` and reported:
"the project venv targets `/opt/python3.12/bin/python3.12`, which is
missing on this host; I repaired it by symlinking it to
`/usr/bin/python3.12`." The agent's mental model was that it had
patched the host. Investigation showed the symlink only landed in the
container's overlay — the host was untouched. But two systemic gaps
were exposed:

1. **The agent thought it had host write access.** Apptainer's
   defaults make the container/host boundary porous enough that an
   agent can't distinguish "I patched the container's view of /opt"
   from "I patched the host."
2. **Operator-side prompts can't enforce isolation.** The prompt told
   the agent to be read-only; the agent ignored it. Prompt-level
   guardrails are not a security mechanism.

The deeper issue: sac's stated positioning is reproducible-by-default,
but its actual default behavior inherited Apptainer's HPC-convenience
defaults (auto-bind `$HOME`, `/tmp`, `/proc`, `/sys`, `/dev`; inherit
all env vars; share host namespaces). Convenience-first defaults are
upside-down for an agent runtime where the container is supposed to
be the security boundary.

## Decisions

### D1. Hardened isolation by default; `relaxed: true` to opt out.

`spec.apptainer.relaxed: false` is the default. sac auto-prepends
`--containall` (filesystem isolation), `--cleanenv` (environment
isolation), and `--writable-tmpfs` (when no overlay is declared) to
the apptainer argv.

`relaxed: true` is an explicit opt-out for HPC-style convenience use
cases. Agents started with `relaxed: true` are **outside the Clew
verification chain** — their runs cannot be attested as reproducible.

**Rationale.** sac's differentiation against LangGraph / CrewAI /
AutoGen is "spec.yaml declares isolation; mechanism enforces it;
external verifier can attest it." Default-strict supports that
thesis directly. HPC users can opt in to the legacy behavior with
one line, and they pay the cost of falling out of the verification
chain (which they typically don't need anyway).

### D2. Universal preflight via `$HOME`-visibility check, not per-path enumeration.

The preflight that sac auto-injects before user `startup_commands` is:

```bash
test "$(id -u)" != "0" || (echo 'ERROR: running as root' && exit 1)
test ! -d "$HOME" || (echo 'ERROR: host $HOME visible — isolation breach' && exit 1)
```

**Rationale.** Per-path enumeration (`test ! -e $HOME/.gitconfig`,
`test ! -e $HOME/.ssh`, …) has unbounded false-negative risk: every
new credential store added in the next decade (`.kube/config`,
`.docker/config.json`, `.netrc`, `.npmrc`, `.pypirc`, `.gnupg/`,
`~/.config/anthropic/`, `~/.bash_history` with embedded secrets, …)
requires a new line. The `$HOME`-visibility check covers all of them
at once.

Under `--containall`, `$HOME` is NOT auto-bound — it should be
invisible inside the container. If the check fails, either `--containall`
isn't in effect or an operator-declared bind brought it in. Either way,
the agent shouldn't start.

**Operator opt-out** for paths that legitimately need to be visible:

```yaml
spec:
  apptainer:
    preflight_allow:
      - "$HOME/.gitconfig"   # acknowledged: agent needs read-only gitconfig
```

The opt-out is declared per-path, not as a blanket "disable preflight."

### D3. AgentCard exposes structured isolation block, not a flat enum.

Instead of `isolation_level: hardened | relaxed | custom`:

```json
"x-scitex-agent-container": {
  "isolation": {
    "level": "hardened",
    "containall": true,
    "cleanenv": true,
    "writable_tmpfs": false,
    "preflight_passed": ["uid-nonzero", "no-host-home"],
    "preflight_allowed": [],
    "binds_count": 3,
    "binds_writable_count": 0
  }
}
```

`level: hardened` is the human shorthand for "all booleans true +
`preflight_allowed: []`". External verifiers (Clew, orochi attestation)
read the structured booleans to attest specific properties.

**Rationale.** A flat `custom` label hides what's custom about it.
A run with `preflight_allow: [$HOME/.ssh]` and a run with
`preflight_allow: [$HOME/.aws]` are both `custom` under the enum but
have very different security profiles. Clew's verification chain wants
to attest specific properties: "did this run set containall? did it
allow any preflight bypasses?" The structured block answers those
directly.

## Considered and rejected

### Rejected: prompt-level isolation guarantees.

"Tell the agent in the prompt to be read-only." The PoC proved this
fails — the agent acknowledged the rule and then violated it. Prompts
are not a security mechanism. Mechanism-level enforcement is
non-negotiable for the Clew thesis.

### Rejected: `test -w <path>` permission-based preflight.

`test ! -w /opt` would fail inside any container with a writable
overlay shadowing `/opt`, even when no host damage is possible. The
intent (catch leaks) and the test (catch any write capability) don't
align under apptainer's overlay model. Existence-based checks on
identity files (`$HOME` visibility) avoid this entirely.

### Rejected: enumerated per-path preflight (`.gitconfig`, `.ssh`, `.aws`, ...).

Forever incomplete; every new credential cache added by tooling
elsewhere requires a sac patch. The `$HOME`-visibility check covers
the same surface in one assertion.

### Rejected: `isolation_level: hardened | relaxed | custom` flat enum.

Loses verifier resolution. Two `custom` runs with very different
preflight_allow sets become indistinguishable on the card.

## Implementation

| Layer | Where | Status |
|---|---|---|
| `apptainer.relaxed: false` default | `config/_types.py::ApptainerSpec` | ✅ shipped earlier today |
| `--containall` auto-prepended | `runtimes/_apptainer_runtime.py` | ✅ shipped |
| `--cleanenv` auto-prepended | same | ✅ shipped (D1) |
| `--writable-tmpfs` when no overlay | same | ✅ shipped (D1) |
| Universal preflight injection | runtime wraps inner cmd with `bash -c "<preflight>\nexec <inner>"` | ✅ shipped (D2) |
| `spec.apptainer.preflight_allow` field | `config/_types.py::ApptainerSpec` | ⏳ deferred (out of D1–D4 scope) |
| AgentCard structured `isolation` block | `a2a/_card.py::project_card` | ✅ shipped (D3) |
| `sac agents check` D4 bind-target warning | `cli_pkg/build_cmds.py::check` | ✅ shipped (D4) |
| Regression tests | `tests/.../test__apptainer_runtime.py` + `tests/.../a2a/test__card.py` + `tests/.../cli_pkg/test_build_cmds.py` + `tests/.../runtimes/test__apptainer_isolation.py` + `tests/.../runtimes/test__apptainer_preflight.py` | ✅ shipped |

## Consequences

**Positive.**
- sac becomes the only A2A-compatible agent runtime that mechanically
  enforces isolation by default. Clear differentiation from LangGraph
  / CrewAI / AutoGen (no isolation concept) and from Docker (isolation
  exists but not declared / not attestable).
- Clew can build verification chains that attest specific isolation
  properties without sac-internal introspection.
- The `scitex-stats-auditor` symlink-confusion class of incidents
  becomes impossible: the agent can't "fix" the host because the host
  isn't reachable.

**Negative / tradeoffs.**
- Existing agents that worked under the old defaults will fail-fast
  if they implicitly relied on `$HOME` auto-bind. Operators must
  either declare the bind explicitly OR opt in to `relaxed: true`.
- The `--cleanenv` default removes `$PATH` inheritance; sac has to
  inject the container's expected `$PATH` itself.
- HPC-style "just run my command inside the container with my home
  visible" is now a two-step (write `relaxed: true`); intentional.

## Addendum: D2 refinement (2026-05-13 evening)

Initial D2 design proposed `test ! -d "$HOME"` as a universal
invariant. Pre-implementation verification against the
scitex-stats-auditor spec.yaml exposed an Apptainer-specific edge
case: **Apptainer creates the entire directory path of every bind
target as scaffolding**, so a bind like
`/home/$USER/proj/scitex-stats:/home/$USER/proj/scitex-stats:ro`
causes `/home/$USER` (== `$HOME`) to exist as a directory inside the
container — even under `--containall`, with no credential files
visible. The D2 check would false-fire on every agent that mirrors
host paths into the container.

Two solutions were considered:

- **Bind-aware preflight** (sac generates the preflight at runtime,
  knowing what binds it's about to create, and the preflight checks
  `$HOME` contents are a subset of the declared bind scaffolding).
  Rejected: dynamic preflight is harder to integrate into Clew's
  verification chain — verifier has to attest the *generator*, not
  just the executed script.
- **Container-canonical paths** (bind targets MUST use container-
  side conventional roots — `/srv/`, `/work/`, `/opt/`, `/data/` —
  never host-mirroring paths). Accepted.

### D4. Bind targets MUST be container-canonical paths.

Bind targets that mirror host paths (`/home/`, `/Users/`, `/root/`,
absolute Windows-style paths) are deprecated. Bind targets MUST live
under conventional container roots: `/srv/`, `/work/`, `/opt/`,
`/data/`.

Spec.yaml convention:

```yaml
spec:
  apptainer:
    binds:
      - $HOME/proj/scitex-stats:/srv/sources/scitex-stats:ro
      - $HOME/proj/scitex-dev:/srv/sources/scitex-dev:ro
```

Inside the container nothing under `/home/$USER` appears, so:

1. The D2 preflight (`test ! -d "$HOME"`) stays static and universal.
2. spec.yaml is **operator-agnostic** — the same spec runs cleanly
   for any user.
3. Verification chain receives a static preflight script with a
   stable sha256; no meta-verification required.

**Rationale, beyond the technical fix.** Container-canonical paths
match Docker / OCI best practice and break the "works on my
machine" failure mode (every agent that hardcodes
`/home/ywatanabe/proj/...` is operator-bound). The shared-path
convention was an HPC convenience artifact; Clew's reproducibility
context inverts it.

**sac-side enforcement (planned).** `sac agents check <name>` will
emit a warning when a bind target starts with `/home/`, `/Users/`, or
`/root/`. Future strict mode (`sac.audit.strict_binds: true`) makes
it an error.

**Convenience.** The runtime sets `$SAC_WORKDIR=/srv/sources` inside
the container (when any bind targets land there), so `startup_prompts`
and operator scripts can use `cd $SAC_WORKDIR/<pkg>` without
hardcoding paths.

### Order of execution

1. ADR addendum (this section) — done.
2. scitex-stats-auditor spec.yaml — bind targets translated to
   `/srv/sources/...`; startup_prompts updated to reference the
   container paths.
3. Implementation: D1 + D2 (static check) + D3 + D4 (CLI validator).
4. Restart scitex-stats-auditor against hardened sac; verify the
   static preflight passes end-to-end.
5. Preserve session.jsonl + preflight result for Clew supplementary.

## Addendum: D5 — canonical container HOME (2026-05-14)

Live verification of the hardened auditor exposed two issues with D2
as originally specified:

1. **Empty-`$HOME` false-fires.** Apptainer scaffolds `$HOME` from the
   inherited passwd entry regardless of bind targets (even under
   `--containall`), so `$HOME` is always a directory. The D2 check was
   relaxed to "`$HOME` is *empty*" — workable, but it forces bind
   targets out of `$HOME`, breaking operator-intuitive paths.
2. **`/srv/`-style targets force script rewrites.** Anything that
   references `~/proj/X` or `$HOME/proj/X` inside the container
   breaks.

### D5. Canonical container HOME = `/home/agent`.

sac auto-injects `--home /home/agent` (skipped only under
`apptainer.relaxed: true` or when the operator declared `--home`).
Inside the container:

- `$HOME == /home/agent`, operator-independent.
- Bind targets use the canonical HOME: `~/proj/X:/home/agent/proj/X`.
- The operator's actual host home is never scaffolded inside the
  container.
- `sac-base.sif` is rebuilt so UID-1000's passwd entry reads `agent`
  (not `ubuntu`); `whoami` matches `$HOME`.

### D5 preflight (replaces D2's empty-`$HOME` check).

```bash
# uid != 0, OR /proc/self/uid_map confirms userns-fakeroot.
if [ "$(id -u)" = "0" ]; then
  awk '$1==0 && $2!=0 {found=1} END {exit !found}' /proc/self/uid_map \
    || exit 11   # real root, refuse
fi
test "$HOME" = "/home/agent" || exit 12  # canonical HOME
```

Two static lines, attestable by sha256. The "no host leak" property
falls out of `--containall` + canonical `--home` + declared `binds:`.

### Rationale.

- **D4 stays** but is no longer the only path. Bind destinations may
  live under `/home/agent/...` (intuitive) OR `/srv/`, `/work/`,
  `/opt/`, `/data/` (container-canonical). Both pass D5.
- **fakeroot opt-in (`apptainer.fakeroot: true`)** integrates: inside
  the container `id -u == 0`, but `/proc/self/uid_map` proves
  userns-mapping; the preflight accepts this without weakening the
  no-real-root guarantee.

### Implementation (D5).

| Layer | Where | Status |
|---|---|---|
| `apptainer.fakeroot: bool` | `config/_types.py::ApptainerSpec` | ✅ |
| Auto-prepend `--home /home/agent` | `runtimes/_apptainer_iso_flags.py` | ✅ |
| Auto-append `--fakeroot` when opted in | same | ✅ |
| Preflight rewrite (uid-map + canonical-HOME) | `runtimes/_apptainer_preflight.py` | ✅ |
| sac-base.sif: ubuntu → agent | `containers/apptainer-base.def` | ✅ recipe; SIF rebuild pending |
| Bind destination validation | `config/_parsers/_apptainer.py` | ✅ |
| Doc updates + AgentCard JSON example | `docs/isolation.md` | ✅ |

### Network isolation addendum (2026-05-14).

Peer review pushed on `--network=bridge`, surfaced the A2A +
MCP-over-loopback interop constraint, and converged on: stay with
host netns for the Clew arXiv window; plan a `--network=bridge` +
bridge-IF bind + `sac-host` `/etc/hosts` injection migration that
preserves MCP URL stability. See `docs/isolation.md` §4.

## References

- [`docs/isolation.md`](../isolation.md) — the 10-category leak catalog this ADR's decisions close.
- [`docs/spec-reference.md`](../spec-reference.md) — `spec.apptainer.relaxed`, `spec.apptainer.fakeroot`.
- The 2026-05-13 scitex-stats-auditor incident — session.jsonl at
  `~/.scitex/agent-container/runtime/scitex-stats-auditor/` for the
  full trace.
