Metadata-Version: 2.4
Name: deepseek-bridge
Version: 0.5.3
Summary: Local OpenAI-compatible proxy for AI coding tools and DeepSeek thinking models
Author: Yixing Lao
License-Expression: MIT
Keywords: cursor,deepseek,proxy,llm,openai-compatible,reasoning
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: Proxy Servers
Classifier: Topic :: Software Development
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyYAML>=6.0
Requires-Dist: textual>=8.2.5
Requires-Dist: urllib3>=2.0
Requires-Dist: orjson>=3.10
Provides-Extra: dev
Requires-Dist: black>=26.3.1; extra == "dev"
Requires-Dist: coverage>=7.10.0; extra == "dev"
Requires-Dist: mypy>=1.15.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: ruff>=0.11.0; extra == "dev"
Dynamic: license-file

# DeepSeek Bridge

[![PyPI version](https://img.shields.io/pypi/v/deepseek-bridge)](https://pypi.org/project/deepseek-bridge/)
[![Python versions](https://img.shields.io/pypi/pyversions/deepseek-bridge)](https://pypi.org/project/deepseek-bridge/)
[![CI](https://github.com/breixopd/deepseek-bridge/actions/workflows/ci.yml/badge.svg)](https://github.com/breixopd/deepseek-bridge/actions/workflows/ci.yml)
[![License](https://img.shields.io/pypi/l/deepseek-bridge)](https://github.com/breixopd/deepseek-bridge/blob/main/LICENSE)

A local proxy that connects AI coding tools (Cursor, GitHub Copilot, Codex, and any OpenAI-compatible client) to DeepSeek's reasoning models by repairing the `reasoning_content` chain that these tools commonly drop from tool-call requests.

```bash
pip install deepseek-bridge
```

DeepSeek's [thinking-mode API](https://api-docs.deepseek.com/guides/thinking_mode#tool-calls) requires every assistant message in a multi-turn tool-call conversation to carry its complete `reasoning_content` back to the server. When a client omits this field, the API returns a 400 error. DeepSeek Bridge intercepts requests, restores the missing reasoning from a local cache, and forwards them upstream — no client-side changes needed.

## Features

### Reasoning Repair
- Injects `reasoning_content` into outgoing tool-call requests, restoring previously cached reasoning from regular and streamed DeepSeek responses.
- Displays thinking tokens in the client UI using collapsible Markdown `<details>` blocks.
- Cursor Agent Mode support: automatically converts Responses API payloads to Chat Completions format.

### Connection Resilience
- Connection pooling via `urllib3` with keep-alive and minimal retries.
- Bounded thread pool prevents thread exhaustion on long-running streaming connections.
- Configurable SSE read timeout (default 180 seconds) prevents hung threads on silent upstreams.
- Tunnel support (cloudflared by default, ngrok optional) with health check and automatic reconnection.
- Graceful shutdown on SIGTERM — active requests drain, reasoning cache is flushed.

### API Compatibility
- `system_fingerprint` in every streaming and non-streaming response.
- `x-request-id` UUID header on every response.
- OpenAI-standard error format.
- CORS headers enabled by default.
- `/v1/embeddings`, `/v1/health`, and `/v1/models` endpoints.
- `/v1/completions` legacy endpoint (auto-converts `prompt` to `messages`).
- Multimodal content arrays preserved.
- DeepSeek V4 thinking parameter support (`thinking`, `reasoning_effort`, `response_format`, `logprobs`).
- Silent mapping of legacy model names (`deepseek-chat`, `deepseek-reasoner`) to `deepseek-v4-flash`.

### Logging and Observability
- Persistent log files with `--log-dir`.
- Heartbeat and pool utilization counters.
- Full structured request traces with `--trace-dir`.
- Terminal UI dashboard with real-time metrics, config editing, and log viewing.

## TUI Dashboard

Starting with v0.2.0, DeepSeek Bridge opens a Terminal UI dashboard by default. The dashboard provides live monitoring and configuration:

- **Dashboard tab** — Real-time request metrics, uptime, tunnel status, and pool utilization.
- **Config tab** — Edit proxy settings (model, network, storage) without restarting.
- **Logs tab** — Streaming log viewer with auto-scroll.

Use `--headless` to disable the TUI and run in classic CLI mode.

## Why This Exists

DeepSeek's thinking-mode API enforces a strict contract: every assistant message that participates in a tool-call chain must include the full `reasoning_content` field. Some AI coding tools (including Cursor) drop this field from their chat transcript, causing DeepSeek to reject subsequent tool-call requests.

DeepSeek Bridge stores copies of `reasoning_content` from every response and patches missing entries back into requests before forwarding them upstream.

## Installation

```bash
# From PyPI
pip install deepseek-bridge

# From source
git clone https://github.com/breixopd/deepseek-bridge.git
cd deepseek-bridge
uv run deepseek-bridge
```

### Usage

```bash
# Full TUI dashboard (default)
deepseek-bridge

# Headless mode — no TUI, classic CLI output
deepseek-bridge --headless

# Run without tunnel (localhost only)
deepseek-bridge --tunnel none --port 9000

# Debug output with trace dumps
deepseek-bridge --debug --trace-dir ./dumps

# Use a custom config file
deepseek-bridge --config ./my-config.yaml

# Clear reasoning cache and exit
deepseek-bridge --clear-reasoning-cache

# Disable thinking display in client UI
deepseek-bridge --no-display-reasoning
```

On first run, DeepSeek Bridge creates:
- `~/.deepseek-bridge/config.yaml` — configuration file
- `~/.deepseek-bridge/reasoning_content.sqlite3` — reasoning cache

## Configuration

All settings are configurable via `~/.deepseek-bridge/config.yaml` or command-line overrides. Example configuration:

```yaml
model: deepseek-v4-pro
base_url: https://api.deepseek.com
thinking: enabled
reasoning_effort: max
display_reasoning: true
collapsible_reasoning: true

host: 127.0.0.1
port: 9000
tunnel: cloudflared
# ngrok_url: https://my-tunnel.ngrok.app  # optional: fixed ngrok endpoint
debug: false
cors: true
ollama: true
stream_read_timeout: 180
request_timeout: 300
```

## Client Setup

### Cursor

In Cursor, add a custom model with these settings:
- **Model**: `deepseek-v4-pro` (or `deepseek-v4-flash`)
- **API Key**: Your DeepSeek API key
- **Base URL**: Your tunnel HTTPS URL with `/v1` path (e.g., `https://app.example.com/v1`)

> **Note on tunnels**: Cursor blocks non-public URLs such as `localhost`. DeepSeek Bridge uses [Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/) by default — a free, persistent HTTPS tunnel with no bandwidth or time limits. Use `--tunnel none` to disable tunneling. Use `--tunnel ngrok` if you prefer [ngrok](https://ngrok.com).

### Cloudflare Tunnel Setup

Cloudflare Named Tunnels are free, persistent, support SSE streaming, and have no bandwidth/time limits. One-time setup:

```bash
# Install cloudflared
brew install cloudflare/cloudflare/cloudflared   # macOS
# Or download from: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/

# Login and create a tunnel
cloudflared tunnel login
cloudflared tunnel create deepseek-bridge

# Point it at your domain
cloudflared tunnel route dns deepseek-bridge app.example.com
```

Then add your tunnel URL to `~/.deepseek-bridge/config.yaml`:

```yaml
tunnel: cloudflared
cf_url: https://app.example.com
```

Use `--tunnel cloudflared` on the CLI, or select `cloudflared` in the TUI dashboard.

### GitHub Copilot

Configure the Ollama endpoint in VS Code:

```json
{
  "github.copilot.chat.byok.ollamaEndpoint": "http://localhost:9000"
}
```

Then open Copilot Chat, navigate to "Manage Models", and your DeepSeek models appear automatically.

Agent Mode is supported — the proxy advertises `tool_calls` capability via the Ollama `/api/show` endpoint and handles reasoning repair across tool-call chains.

For the new `customOAIModels` path (VS Code Insiders 1.104+):

```json
{
  "github.copilot.chat.customOAIModels": {
    "deepseek-v4-pro": {
      "name": "DeepSeek V4 Pro",
      "url": "http://localhost:9000/v1/chat/completions",
      "toolCalling": true,
      "vision": false,
      "thinking": true,
      "maxInputTokens": 1000000,
      "maxOutputTokens": 384000,
      "streaming": true,
      "requiresAPIKey": true
    }
  }
}
```

### Other OpenAI-Compatible Clients

Any client that speaks the OpenAI `/v1/chat/completions` API can use DeepSeek Bridge. Set the client's base URL to `http://localhost:9000/v1` (or your tunnel URL).

## How It Works

1. **Request interception**: The proxy receives a `/v1/chat/completions` request from the client.
2. **Format detection**: If the request uses OpenAI Responses API format (common in Cursor Agent Mode), it is converted to Chat Completions format.
3. **Reasoning repair**: Each assistant message in the conversation is checked. Missing `reasoning_content` fields are looked up in the local SQLite cache and restored.
4. **Cache isolation**: Cache keys are scoped by a SHA-256 hash of the conversation prefix, upstream model, configuration, and API key. Different conversations and users never collide.
5. **Response processing**: Reasoning content from the upstream response is cached for future requests. Display adapters mirror reasoning thoughts into Markdown `<details>` blocks visible in the client.

## Known Limitations

### Cursor Sub-Agents

Cursor sub-agents do not inherit custom API base URL or API key settings. This is a Cursor-side bug (see [forum thread](https://forum.cursor.com/t/sub-agents-are-not-using-custom-openai-base-urls/152574)). Use the main agent (`Cmd+Shift+0` to toggle) for direct DeepSeek chat. Sub-agents that route through the proxy will work correctly; those that bypass it fall back to Cursor's built-in models.

### Cursor Agent Mode Responses API Format

Cursor Agent mode sends OpenAI Responses API-format payloads to the Chat Completions endpoint. DeepSeek Bridge detects and converts these automatically.

### Reasoning Display

Cursor's native reasoning UI is available only for Cursor's own models. For custom endpoints, reasoning content is mirrored into visible Markdown details blocks. Use `--no-display-reasoning` to disable this behavior.

## Development

```bash
# Run tests
uv run python -m unittest discover -s tests

# Format and lint
uv run pre-commit run --all-files

# Type check
uv run mypy src/ --check-untyped-defs

# Run with coverage
uv run coverage run -m unittest discover -s tests
uv run coverage report
```

## CLI Reference

| Flag | Default | Description |
|------|---------|-------------|
| `--headless` | off | Run without TUI |
| `--model` | `deepseek-v4-pro` | Fallback model when request omits it |
| `--thinking` | `enabled` | DeepSeek thinking mode |
| `--reasoning-effort` | `max` | Reasoning effort level |
| `--display-reasoning` | on | Show reasoning content in client UI |
| `--collapsible-reasoning` | on | Use collapsible Markdown for reasoning |
| `--host` | `127.0.0.1` | Bind address |
| `--port` | `9000` | Bind port |
| `--tunnel` | `cloudflared` | Tunnel service (none, cloudflared, ngrok) |
| `--cf-url` | none | Cloudflare tunnel public URL |
| `--ngrok-url` | none | Fixed ngrok endpoint URL |
| `--base-url` | `https://api.deepseek.com` | Upstream DeepSeek API URL |
| `--cors` | on | Send CORS headers |
| `--stream-read-timeout` | `180` | SSE read timeout in seconds |
| `--max-thread-pool` | `20` | Max concurrent request threads |
| `--max-pool-connections` | `10` | Max upstream connections |
| `--ollama` / `--no-ollama` | on | Enable/disable Ollama endpoints |
| `--log-dir` | none | Directory for persistent log files |
| `--trace-dir` | none | Directory for request trace dumps |
| `--debug` | off | Enable DEBUG-level log output |
| `--compact` | off | One-line-per-request output |
| `--config` | ~/.deepseek-bridge/config.yaml | Config file path |
| `--no-log` | off | Disable all log file output |
| `--reasoning-content-path` | ~/.deepseek-bridge/reasoning_content.sqlite3 | Reasoning cache path |
| `--reasoning-cache-max-age-seconds` | 604800 | Max age of cached reasoning (seconds) |
| `--missing-reasoning-strategy` | recover | Strategy for missing reasoning (recover/reject) |
| `--max-request-body-bytes` | 20971520 | Max request body size in bytes |
| `--clear-reasoning-cache` | off | Clear reasoning cache and exit |
| `--version` | - | Print version and exit |

## License

MIT License

## Acknowledgements

Based on [yxlao/deepseek-cursor-proxy](https://github.com/yxlao/deepseek-cursor-proxy), the original project that started this work.
