Run Cantrip in a VM with host GPU inference

Sandbox Cantrip in a Multipass VM and let it reach inference snaps running on the host's GPU, without GPU passthrough.

Cantrip works well sandboxed in a Multipass VM, but inference snaps such as qwen3-coder and gemma4 are happiest using the host's GPU directly. Multipass on Linux does not support PCI passthrough, and full VFIO passthrough on a hybrid-graphics laptop typically means the host loses its GPU while the VM runs. The pragmatic answer is to keep inference on the host where the GPU lives, and let the VM reach those snaps over the Multipass bridge.

This guide walks through the setup. A helper script (scripts/setup-vm-inference-proxy.sh) automates the host side.

How it works

Each inference snap binds 127.0.0.1:<port> on the host. The Multipass VM sees the host as 10.42.160.1 (the mpqemubr0 bridge gateway) and cannot reach the host's loopback. A small socat forwarder per port listens on the bridge IP only, forwarding to host loopback:

cantrip VM (10.42.160.198)
   │  http://10.42.160.1:8332/v1/...
   ▼
mpqemubr0 (10.42.160.1)  ← socat listener
   │
   ▼
127.0.0.1:8332 (qwen3-coder snap, GPU)

The forwarder never binds on 0.0.0.0, and ufw is locked down so only traffic arriving on mpqemubr0 from the VM subnet can reach it. The inference snaps have no auth, so this matters.

Prerequisites

Set up the host

Run the helper script. The defaults expose qwen3-coder (8332) and gemma4 (8336):

sudo bash scripts/setup-vm-inference-proxy.sh

Or pass an explicit port list:

sudo bash scripts/setup-vm-inference-proxy.sh 8328 8332 8336

The script:

  1. Installs socat if missing.
  2. Drops a templated systemd unit cantrip-inference-proxy@.service and enables one instance per port.
  3. If ufw is active, adds rules allowing only the VM subnet (10.42.160.0/24, ingress on mpqemubr0) to reach those ports.

If your Multipass install uses a different bridge, override with environment variables (check with ip -4 addr show dev mpqemubr0):

sudo BRIDGE_IFACE=mpqemubr0 BRIDGE_IP=10.42.160.1 BRIDGE_NET=10.42.160.0/24 \
    bash scripts/setup-vm-inference-proxy.sh

Verify from the VM:

multipass exec cantrip -- curl -s http://10.42.160.1:8332/v1/models
multipass exec cantrip -- curl -s http://10.42.160.1:8336/v1/models

Point Cantrip at the host snaps

Inside the VM, Cantrip's inference-snap provider auto-discovery runs <snap-name> status, which won't work because the snap isn't installed in the VM. Use --base-url to bypass discovery:

# Inside the cantrip VM
cantrip --provider inference-snap \
        --snap-name qwen3-coder \
        --base-url http://10.42.160.1:8332/v1

Or use the generic provider:

cantrip --provider openai-compatible \
        --base-url http://10.42.160.1:8332/v1 \
        --model Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf

For gemma4 swap the port to 8336 and the snap name accordingly.

Security notes

Remove the proxy

for port in 8332 8336; do
  sudo systemctl disable --now "cantrip-inference-proxy@${port}.service"
done
sudo rm /etc/systemd/system/cantrip-inference-proxy@.service
sudo systemctl daemon-reload

# Optional: revoke ufw rules
sudo ufw status numbered
sudo ufw delete <number>  # for each rule mentioning cantrip

Why not other approaches