Metadata-Version: 2.4
Name: databricks-docs-mcp
Version: 1.0.6
Summary: Lightweight Databricks Documentation MCP Server
Project-URL: Homepage, https://rokorolev.gitlab.io/databricks-docs-mcp
Project-URL: Repository, https://gitlab.com/rokorolev/databricks-docs-mcp
Project-URL: Issues, https://gitlab.com/rokorolev/databricks-docs-mcp/-/issues
Author-email: Roman Korolev <spark_development@yahoo.com>
License: MIT License
        
        Copyright (c) 2025
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: ai,databricks,documentation,llm,mcp
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Documentation
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: ddgs>=9.0
Requires-Dist: fastmcp>=2.0
Requires-Dist: httpx>=0.27
Requires-Dist: markdownify>=0.13
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# Databricks Documentation MCP Server

A lightweight, stateless [Model Context Protocol (MCP)](https://modelcontextprotocol.io) server that lets AI assistants read and search [docs.databricks.com](https://docs.databricks.com) in real time — no local cache, no database, no crawling required.

## Features

- **Live fetch** — always returns current documentation content, never stale
- **Full-text search** — real-time, site-scoped search via DuckDuckGo `site:` operator
- **Docusaurus-aware extraction** — strips navigation, sidebars, and page chrome; returns clean markdown
- **Section extraction** — pull specific h2 sections from long reference pages
- **Pagination** — `start_index` and `max_length` parameters for large pages

## Prerequisites

- Python 3.10 or later
- [uv](https://docs.astral.sh/uv/) (recommended) or pip

## Installation

```bash
# run directly without installing (recommended)
uvx databricks-docs-mcp

# or install permanently
pip install databricks-docs-mcp
```

### MCP client configuration (release install)

#### Claude Desktop

Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or  
`%APPDATA%\Claude\claude_desktop_config.json` (Windows):

```json
{
  "mcpServers": {
    "databricks-docs": {
      "command": "uvx",
      "args": ["databricks-docs-mcp"]
    }
  }
}
```

#### VS Code (GitHub Copilot)

Add to `.vscode/mcp.json` in your workspace:

```json
{
  "servers": {
    "databricks-docs": {
      "type": "stdio",
      "command": "uvx",
      "args": ["databricks-docs-mcp"]
    }
  }
}
```

> To pin a specific version, use `"args": ["databricks-docs-mcp==1.2.0"]`.

### From source (development)

```bash
git clone https://gitlab.com/rokorolev/databricks-docs-mcp.git
cd databricks-docs-mcp
uv sync --extra dev
```

MCP client config for a local clone:

```json
{
  "servers": {
    "databricks-docs": {
      "type": "stdio",
      "command": "uv",
      "args": ["--directory", "/absolute/path/to/databricks-mcp", "run", "databricks-docs-mcp"]
    }
  }
}
```

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `MCP_USER_AGENT` | `Mozilla/5.0 (compatible; DatabricksDocsMCP/1.0)` | HTTP User-Agent sent with every request |
| `FASTMCP_LOG_LEVEL` | `WARNING` | Log verbosity: `DEBUG`, `INFO`, `WARNING`, `ERROR` |

## Tools

### `search_documentation`

Search docs.databricks.com using a site-scoped real-time web search.

| Parameter | Type | Default | Description |
|---|---|---|---|
| `query` | string | — | Keywords or topic to search for |
| `limit` | integer | `10` | Maximum results to return (max 30) |

Returns a JSON array of results with URL, title, and snippet.

### `read_documentation`

Fetch a docs.databricks.com page as clean markdown.

| Parameter | Type | Default | Description |
|---|---|---|---|
| `url` | string | — | Full `docs.databricks.com` URL |
| `max_length` | integer | `5000` | Maximum characters to return per call |
| `start_index` | integer | `0` | Character offset for pagination |

Returns markdown-formatted page content with a continuation hint when the page is truncated.

### `read_sections`

Extract specific h2 sections from a docs page by heading title.

| Parameter | Type | Default | Description |
|---|---|---|---|
| `url` | string | — | Full `docs.databricks.com` URL |
| `section_titles` | string[] | — | h2 heading titles to extract (case-insensitive) |

Returns markdown of the matched sections only.

## Basic Usage

### Recommended workflow

**1. Search** for a topic:
```
search_documentation("Delta Live Tables pipeline settings")
```

**2. Read** the most relevant result:
```
read_documentation("https://docs.databricks.com/aws/en/dlt/settings.html")
```

**3. Extract** specific sections from large pages:
```
read_sections(
  "https://docs.databricks.com/aws/en/dlt/settings.html",
  ["Pipeline mode", "Compute settings"]
)
```

### Tips

- Databricks docs URLs follow the pattern `https://docs.databricks.com/<cloud>/en/<topic>/...`  
  Use `aws` for AWS, `gcp` for GCP, `azure` for Azure.
- Use `start_index` in `read_documentation` to page through long articles.
- Section titles for `read_sections` are matched case-insensitively against `<h2>` headings on the page.

## Development

```bash
uv sync --extra dev
```

### Lint

```bash
uv run ruff check src/ tests/
```

### Run tests

```bash
uv run --frozen pytest --cov --cov-branch --cov-report=term-missing
```

### Project structure

```
src/
  databricks_docs_mcp/
    server.py   # MCP server and tool definitions
    utils.py    # HTML extraction and formatting utilities
    models.py   # Pydantic models for search results
tests/
  test_server.py
  test_utils.py
```

## License

MIT — see [LICENSE](LICENSE).
