Metadata-Version: 2.4
Name: tetherd
Version: 0.1.0
Summary: Anthropic-compatible local LLM server for Claude Code, backed by mlx-lm / mlx-vlm on Apple Silicon.
Author-email: Ryan Kim <ryankim.labs@gmail.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: anthropic,apple-silicon,claude-code,gemma,llm,local-llm,mlx
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Requires-Dist: litellm[proxy]>=1.60
Requires-Dist: mlx-vlm>=0.4.3
Requires-Dist: rich>=13
Requires-Dist: uvicorn>=0.30
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: twine>=5; extra == 'dev'
Description-Content-Type: text/markdown

# tether

Local Anthropic-compatible LLM server for [Claude Code](https://claude.com/claude-code),
backed by [`mlx-lm`](https://github.com/ml-explore/mlx-lm) on Apple Silicon.

One command launches the proxy and opens Claude Code in the same
terminal — inspired by `ollama launch claude`.

## Requirements

- macOS on Apple Silicon
- Python 3.12+
- [`claude`](https://claude.com/claude-code) on `PATH`

## Install

From PyPI (the distribution is named `tetherd` — the name `tether`
was already taken — but it still installs the `tether` command):

```bash
pipx install tetherd         # recommended — isolated venv, `tether` on PATH
# or
pip install tetherd
```

From GitHub (tracks `main`):

```bash
pipx install "git+https://github.com/ryank1m/tether.git"
```

For development (editable checkout):

```bash
git clone https://github.com/ryank1m/tether.git
cd tether
python3.12 -m venv .venv
.venv/bin/pip install -e ".[dev]"
```

All forms expose the `tether` console script and pull in `mlx-vlm`
and `litellm` as dependencies.

### Getting a model

`tether` never downloads weights itself. Place a model directory
anywhere under `~/.tether/models/` (any folder containing a
`config.json` is auto-discovered) and pass `--model <name>` or pick
it interactively:

```bash
# Option A: download from HuggingFace (needs network)
huggingface-cli download mlx-community/gemma-4-e2b-it-4bit \
  --local-dir ~/.tether/models/gemma-4-e2b-it-4bit

# Option B: copy an already-downloaded model directory
#          into ~/.tether/models/ by any means you like
#          (scp, rsync, USB drive, sneakernet — tether only
#           cares that config.json is present).
```

## Usage

Launch Claude Code against a local MLX model — this is the default
path, no extra terminals or env vars needed:

```bash
tether --model mlx-community/gemma-3-4b-it-4bit
```

Under the hood `tether` loads the model, brings up a LiteLLM
Anthropic-compatible proxy on `127.0.0.1:8080`, waits for it to be
ready, and then spawns `claude` as a child process with
`ANTHROPIC_BASE_URL`, `ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL`,
`CLAUDE_CODE_SUBAGENT_MODEL`, and friends wired up automatically.
Ctrl-C exits Claude Code and cleans up the proxy.

Drop models into `~/.tether/models/` and pick one interactively:

```bash
tether
# ↑/↓ or j/k · Enter to select · q/Esc to cancel
```

### Local models (no HuggingFace involvement)

Passing an HF repo id (`mlx-community/gemma-4-e2b-it-4bit`) causes
`mlx_vlm.load` to revalidate the cached files against HF on every
start. To take HuggingFace out of the loop entirely, download once
and pass the directory directly:

```bash
huggingface-cli download mlx-community/gemma-4-e2b-it-4bit \
  --local-dir ~/.tether/models/gemma-4-e2b-it-4bit
tether --model ~/.tether/models/gemma-4-e2b-it-4bit
```

The TUI picker auto-discovers any subdirectory of `~/.tether/models/`
that contains a `config.json`, so a model you download there shows up
automatically when you run `tether` with no `--model`.

Forward arguments to `claude` with a trailing `--`:

```bash
tether --model mlx-community/gemma-3-4b-it-4bit -- --resume
```

List local models and exit:

```bash
tether --list
```

### Config directory

`tether` reads every `*.toml` file in `~/.tether/config/` at
startup. On first run it creates
`~/.tether/config/config.toml` with sensible defaults (including
a 64k-token context cap) and a commented template for every field.
It will never overwrite that file again — edit it freely.

```
~/.tether/config/
├── config.toml         # auto-created, always loaded first
├── 10-m4pro.toml       # optional drop-in (per-machine tweaks)
└── 20-work.toml        # optional drop-in (project-specific)
```

All files are merged on top of `config.toml` in **alphabetical
order** by filename. The `NN-name.toml` numeric-prefix convention
(same idea as systemd drop-ins) is the recommended way to control
ordering. Dotfiles, non-`.toml` files, and subdirectories are
ignored.

**Precedence** (highest → lowest):
1. CLI flags (`--max-context 4096`, `--memory-cap 18GB`, …)
2. Alphabetically-latest drop-in
3. Earlier drop-ins
4. `config.toml`
5. Built-in defaults

Every option can also be set from the command line for one-off
overrides — see `tether --help`.

Example drop-in for a 24 GB M4 Pro running the 26B MoE variant:

```toml
# ~/.tether/config/10-m4pro.toml
[model]
default = "gemma-4-26b-a4b-it-4bit"
memory_cap = "18GiB"
max_context = 32000
```

### Flags

| flag | default | purpose |
|---|---|---|
| `--model PATH_OR_REPO` | — | explicit model; skips the picker |
| `--host` | `127.0.0.1` | bind host for the proxy |
| `--port` | `8080` | bind port for the proxy |
| `--models-dir` | `~/.tether/models` | override discovery dir |
| `--list` | — | list local models and exit |
| `--serve-only` / `--no-claude` | — | run only the proxy, do not launch claude |
| `--claude-path` | `which claude` | explicit path to the claude binary |
| `--memory-cap` | from config | hard MLX memory ceiling, e.g. `18GB`, `20GiB` |
| `--max-context` | from config (64k) | truncate history so prompt stays under N tokens |
| `--log-level` | `warning` | uvicorn log level |
| `-- …` | — | everything after `--` is forwarded to `claude` |

## Advanced: standalone proxy

If you want to point a non–Claude Code client at the proxy, or run it
under systemd / launchd, use `--serve-only`:

```bash
tether --serve-only --model mlx-community/gemma-3-4b-it-4bit
```

Then point any Anthropic-Messages-compatible client at
`http://127.0.0.1:8080`. Quick curl check:

```bash
curl -s -X POST http://127.0.0.1:8080/v1/messages \
  -H 'Content-Type: application/json' \
  -H 'anthropic-version: 2023-06-01' \
  -d '{"model":"mlx-local","max_tokens":50,
       "messages":[{"role":"user","content":"Say pineapple."}]}'
```

The proxy registers a wildcard route, so any model name in the request
body routes to the loaded MLX model.

## Known limitations (v0.1)

- **Token usage counts are zero.** `mlx_vlm.generate` is not yet wired
  up to return prompt/completion token counts through to the response.
- **Single session.** One shared prompt cache; two concurrent Claude
  Code sessions with different system prompts will thrash the cache.
  Fine for a single-user local server.
- **No auth / TLS.** Bind is `127.0.0.1` only.
- **Apple Silicon + macOS only.**

## Architecture

See `plan/plan-option-b.md` (proxy + custom provider),
`plan/plan-unified-launch.md` (single-terminal launch), and
`plan/plan-tool-use.md` (Gemma 4 tool-call wiring). Request flow:

```
tether
   ├── uvicorn (daemon thread)  ──► LiteLLM ──► MLXProvider ──► mlx_lm
   └── subprocess: claude (foreground, owns TTY)
                      │
                      └── POST /v1/messages → proxy thread
```

Everything lives in one process tree. The model is loaded once at
startup (eagerly, so load failures surface before Claude Code
launches) and reused across every subsequent call.
