Metadata-Version: 2.4
Name: databricks-mcp-server
Version: 0.4.2
Summary: A Model Completion Protocol (MCP) server for Databricks
Author-email: Olivier Debeuf De Rijcker <olivier@markov.bot>
License: MIT
Keywords: ai,cursor,databricks,llm,mcp,model-context-protocol
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP :: HTTP Servers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: databricks-sdk
Requires-Dist: httpx
Requires-Dist: mcp[cli]>=1.2.0
Provides-Extra: cli
Requires-Dist: click; extra == 'cli'
Provides-Extra: dev
Requires-Dist: anyio; extra == 'dev'
Requires-Dist: black; extra == 'dev'
Requires-Dist: fastapi; extra == 'dev'
Requires-Dist: pylint; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Description-Content-Type: text/markdown

# Databricks MCP Server

A production-ready **Model Context Protocol (MCP)** server that exposes Databricks REST capabilities to MCP-compatible agents and tooling. Version **0.4.2** introduces structured responses, resource caching, retry-aware networking, and end-to-end resilience improvements.

---

## Table of Contents
1. [Key Capabilities](#key-capabilities)
2. [Architecture Highlights](#architecture-highlights)
3. [Installation](#installation)
4. [Configuration](#configuration)
5. [Running the Server](#running-the-server)
6. [Integrating with MCP Clients](#integrating-with-mcp-clients)
7. [Working with Tool Responses](#working-with-tool-responses)
8. [Available Tools](#available-tools)
9. [Development Workflow](#development-workflow)
10. [Testing](#testing)
11. [Publishing Builds](#publishing-builds)
12. [Support & Contact](#support--contact)
13. [License](#license)

---

## Key Capabilities
- **Structured MCP Responses** – Each tool returns a `CallToolResult` with a human-readable summary in `content` and machine-readable payloads in `structuredContent` that conform to the tool’s `outputSchema`.
- **Resource Caching** – Large notebook/workspace exports are cached once and returned as `resource_link` content blocks with URIs such as `databricks://exports/{id}` (also reflected in metadata for convenience).
- **Progress & Metrics** – Long-running actions stream MCP progress notifications and track per-tool success/error/timeout/cancel metrics.
- **Resilient Networking** – Shared HTTP client injects request IDs, enforces timeouts, and retries retryable Databricks responses (408/429/5xx) with exponential backoff.
- **Async Runtime** – Built on `mcp.server.FastMCP` with centralized JSON logging and concurrency guards for predictable stdio behaviour.

## Architecture Highlights
- `databricks_mcp/server/databricks_mcp_server.py` – FastMCP server with tool registration, progress handling, metrics, and resource caching.
- `databricks_mcp/core/utils.py` – HTTP utilities with correlation IDs, retries, and error mapping to `DatabricksAPIError`.
- `databricks_mcp/core/logging_utils.py` – JSON logging configuration for stderr/file outputs.
- `databricks_mcp/core/models.py` – Pydantic models (e.g., `ClusterConfig`) used by tool schemas.
- Tests under `tests/` mock Databricks APIs to validate orchestration, structured responses, and schema metadata without shell scripts.

For an in-depth tour of data flow and design decisions, see [ARCHITECTURE.md](ARCHITECTURE.md).

## Installation

### Prerequisites
- Python 3.10+
- [`uv`](https://github.com/astral-sh/uv) for dependency management and publishing

### Quick Install (recommended)
Register the server with Cursor using the deeplink below – it resolves to `uvx databricks-mcp-server@latest` and picks up future updates automatically.

```text
cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==
```

### Manual Installation
```bash
# Clone and enter the repository
git clone https://github.com/markov-kernel/databricks-mcp.git
cd databricks-mcp

# Create an isolated environment (optional but recommended)
uv venv
source .venv/bin/activate  # Linux/Mac
# .\.venv\Scripts\activate  # Windows PowerShell

# Install package and development dependencies
uv pip install -e .
uv pip install -e ".[dev]"
```

## Configuration
Set the following environment variables (or populate `.env` from `.env.example`).

```bash
export DATABRICKS_HOST="https://your-workspace.databricks.com"
export DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXX"
export DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345"  # optional default
export TOOL_TIMEOUT_SECONDS=300
export MAX_CONCURRENT_REQUESTS=8
export HTTP_TIMEOUT_SECONDS=60
export API_MAX_RETRIES=3
export API_RETRY_BACKOFF_SECONDS=0.5
```

## Running the Server
```bash
uvx databricks-mcp-server@latest
```
> Tip: append `--refresh` (e.g., `uvx databricks-mcp-server@latest --refresh`) to force `uv` to resolve the latest PyPI release after publishing.

To adjust logging:
```bash
uvx databricks-mcp-server@latest -- --log-level DEBUG
```

## Integrating with MCP Clients

### Codex CLI (STDIO)
Register the server and inject credentials via the CLI:

```bash
codex mcp add databricks   --env DATABRICKS_HOST="https://your-workspace.databricks.com"   --env DATABRICKS_TOKEN="dapi_XXXXXXXXXXXXXXXX"   --env DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345"   -- uvx databricks-mcp-server@latest
# Add --refresh immediately after a publish to invalidate the uv cache
```

Or edit `~/.codex/config.toml`:

```toml
[mcp_servers.databricks]
command = "uvx"
args    = ["databricks-mcp-server@latest"]
env = {
  DATABRICKS_HOST = "https://your-workspace.databricks.com",
  DATABRICKS_TOKEN = "dapi_XXXXXXXXXXXXXXXX",
  DATABRICKS_WAREHOUSE_ID = "sql_warehouse_12345"
}
startup_timeout_sec = 15
tool_timeout_sec    = 300
```

> Planning an HTTP deployment? Codex also supports `url = "https://…"` plus
> `bearer_token_env_var = "DATabricks_TOKEN"` or `codex mcp login` (with
> `experimental_use_rmcp_client = true`).

### Cursor
```jsonc
{
  "mcpServers": {
    "databricks-mcp-local": {
      "command": "uvx",
      "args": ["databricks-mcp-server@latest"],
      "env": {
        "DATABRICKS_HOST": "https://your-workspace.databricks.com",
        "DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXX",
        "DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345",
        "RUNNING_VIA_CURSOR_MCP": "true"
      }
    }
  }
}
```
Restart Cursor after saving and invoke tools as `databricks-mcp-local:<tool>`.

### Claude CLI
```bash
claude mcp add databricks-mcp-local   -s user   -e DATABRICKS_HOST="https://your-workspace.databricks.com"   -e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXX"   -e DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345"   -- uvx databricks-mcp-server@latest
```

## Working with Tool Responses
`structuredContent` carries machine-readable payloads. Large artifacts are returned as `resource_link` content blocks using URIs like `databricks://exports/{id}` and can be fetched via the MCP resources API.

```python
result = await session.call_tool("list_clusters", {})
summary = next((block.text for block in result.content if getattr(block, "type", "") == "text"), "")
clusters = (result.structuredContent or {}).get("clusters", [])
resource_links = [block for block in result.content if isinstance(block, dict) and block.get("type") == "resource_link"]
```

Progress notifications follow MCP’s progress token mechanism; Codex surfaces these messages in the UI while a tool runs.

### Example – SQL Query
```python
result = await session.call_tool("execute_sql", {"statement": "SELECT * FROM samples LIMIT 10"})
print(result.content[0].text)
rows = (result.structuredContent or {}).get("result", [])
```

### Example – Workspace File Export
```python
result = await session.call_tool("get_workspace_file_content", {
    "path": "/Users/user@domain.com/report.ipynb",
    "format": "SOURCE"
})
resource_link = next((block for block in result.content if isinstance(block, dict) and block.get("type") == "resource_link"), None)
if resource_link:
    contents = await session.read_resource(resource_link["uri"])
```

## Available Tools
| Category | Tool | Description |
| --- | --- | --- |
| Clusters | `list_clusters`, `create_cluster`, `terminate_cluster`, `get_cluster`, `start_cluster` | Manage interactive clusters |
| Jobs | `list_jobs`, `create_job`, `run_job`, `run_notebook`, `sync_repo_and_run_notebook`, `get_run_status`, `list_job_runs`, `cancel_run` | Manage scheduled and ad-hoc jobs |
| Workspace | `list_notebooks`, `export_notebook`, `import_notebook`, `delete_workspace_object`, `get_workspace_file_content`, `get_workspace_file_info` | Inspect and manage workspace assets |
| DBFS | `list_files`, `dbfs_put`, `dbfs_delete` | Explore DBFS and manage files |
| SQL | `execute_sql` | Submit SQL statements with optional `warehouse_id`, `catalog`, `schema_name` |
| Libraries | `install_library`, `uninstall_library`, `list_cluster_libraries` | Manage cluster libraries |
| Repos | `create_repo`, `update_repo`, `list_repos`, `pull_repo` | Manage Databricks repos |
| Unity Catalog | `list_catalogs`, `create_catalog`, `list_schemas`, `create_schema`, `list_tables`, `create_table`, `get_table_lineage` | Unity Catalog operations |

## Development Workflow
```bash
uv run black databricks_mcp tests
uv run pylint databricks_mcp tests
uv run pytest
uv build
uv publish --token "$PYPI_TOKEN"
```

## Testing
```bash
uv run pytest
```
Pytest suites mock Databricks APIs, providing deterministic structured outputs and transcript tests.

## Publishing Builds
Ensure `PYPI_TOKEN` is available (via `.env` or environment) before publishing:
```bash
uv build
uv publish --token "$PYPI_TOKEN"
```

## Support & Contact
- Maintainer: Olivier Debeuf De Rijcker (olivier@markov.bot)
- Issues: [GitHub Issues](https://github.com/markov-kernel/databricks-mcp/issues)
- Architecture deep dive: [ARCHITECTURE.md](ARCHITECTURE.md)

## License

Released under the MIT License. See [LICENSE](LICENSE).
