Metadata-Version: 2.4
Name: ibm-watsonxdata-mcp-server
Version: 0.1.3
Summary: Model Context Protocol (MCP) server for IBM watsonx.data - enables AI assistants to query and explore lakehouse data
Project-URL: Homepage, https://github.com/IBM/ibm-watsonxdata-mcp-server
Project-URL: Documentation, https://github.com/IBM/ibm-watsonxdata-mcp-server#readme
Project-URL: Repository, https://github.com/IBM/ibm-watsonxdata-mcp-server
Project-URL: Issues, https://github.com/IBM/ibm-watsonxdata-mcp-server/issues
Project-URL: Changelog, https://github.com/IBM/ibm-watsonxdata-mcp-server/blob/main/CHANGELOG.md
Author-email: IBM <noreply@ibm.com>
License: Apache-2.0
License-File: LICENSE
Keywords: ai,claude,fastmcp,ibm,lakehouse,llm,mcp,model-context-protocol,presto,spark,watsonx,watsonx-data
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: authlib>=1.6.9
Requires-Dist: cryptography>=46.0.7
Requires-Dist: fastmcp>=3.2.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: ibm-cloud-sdk-core>=3.20.0
Requires-Dist: opentelemetry-api>=1.27.0
Requires-Dist: opentelemetry-instrumentation-httpx>=0.48b0
Requires-Dist: opentelemetry-sdk>=1.27.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: structlog>=24.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.13.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=6.0.0; extra == 'dev'
Requires-Dist: pytest-env>=1.1.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: respx>=0.21.0; extra == 'dev'
Requires-Dist: ruff>=0.7.0; extra == 'dev'
Requires-Dist: types-requests; extra == 'dev'
Description-Content-Type: text/markdown

# IBM watsonx.data MCP Server

## Overview

The IBM watsonx.data MCP Server enables AI assistants to interact seamlessly with IBM watsonx.data lakehouses using natural language. It provides specialized tools across 6 categories for comprehensive lakehouse operations:

- **Platform Management**: Instance status and configuration
- **Engine Operations**: Manage and monitor Presto and Spark engines
- **Catalog Management**: Browse schemas, tables, and metadata; modify table structures
- **Query Execution**: Run SELECT, INSERT, UPDATE queries with query plan analysis
- **Spark Applications**: Submit, monitor, and manage Spark jobs
- **Data Ingestion**: Load data from object storage into lakehouse tables

Currently, it supports stdio transport mechanism for local subprocess. For comprehensive details on transport options, including implementation guidelines and security best practices, refer to the [MCP Transports Specification](https://modelcontextprotocol.io/specification/2025-11-25/basic/transports).

## Supported Features

### Core Capabilities
- **Multiple Tools** organized into 6 functional categories (see [TOOLS.md](TOOLS.md))
- **Platform Tools**: Instance details and status
- **Engine Tools**: Lifecycle management for Presto and Spark engines
- **Catalog Tools**: Schema and table discovery, metadata operations, DDL operations
- **Query Tools**: SELECT, INSERT, UPDATE execution with query plan analysis
- **Spark Application Tools**: Submit and manage Spark applications
- **Ingestion Tools**: Data loading from object storage (CSV, Parquet, JSON)

### Security & Authentication
- IBM Cloud IAM authentication with automatic token refresh
- Read and write operations with appropriate access controls

### Transport & Integration
- **Current**: stdio transport for local subprocess communication
- **Planned**: Streamable HTTP transport for remote deployments
- Compatible with Claude Desktop, IBM Bob, and other MCP-enabled AI assistants


## Architecture Overview

```mermaid
flowchart LR
    User --> Assistant[AI Assistant]
    Assistant -->|stdio/JSON-RPC| Server[watsonx.data MCP Server]
    Server -->|IAM Auth + API Calls| WX[watsonx.data Service]
    WX --> Engines[Presto & Spark Engines]
    Engines --> Lakehouse[Lakehouse Storage]

    style Server fill:#f3e5f5,stroke:#4a148c
    style WX fill:#e0f2f1,stroke:#00695c
    style Engines fill:#fff3e0,stroke:#ef6c00
```


### Query Execution Flow
```mermaid
sequenceDiagram
    participant A as AI Assistant
    participant S as MCP Server
    participant I as IBM Cloud IAM
    participant W as watsonx.data API
    participant E as Presto/Spark Engines

    A->>S: Natural-language request (MCP)
    S->>I: Request IAM token
    I-->>S: IAM access token
    S->>W: API request (catalog, SQL, schema...)
    W->>E: Query execution / metadata ops
    E-->>W: Results
    W-->>S: Response
    S-->>A: Structured MCP result
```

## Getting Started

### 1. Prerequisites

Before installation, ensure you have:

- **Python 3.11 or higher** ([Download](https://www.python.org/downloads/))
- **uv package manager** ([Install](https://github.com/astral-sh/uv))
- **IBM Cloud account** ([Create Account](https://cloud.ibm.com/docs/account?topic=account-account-getting-started))
- **watsonx.data instance** ([Provision Instance](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_prov_lite_1)) and ([Setup](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_hp_intro))
- **IBM Cloud API key** ([Create API Key](https://cloud.ibm.com/iam/apikeys))
- Gather Instance details:
  - **Base URL**: Obtain from your watsonx.data instance:
    - Option 1: Copy the hostname from your browser's address bar when accessing the instance, then append `/lakehouse/api`
    - Option 2: Navigate to instance details → **Data Access Service (DAS) endpoint**
    - Example format: `https://us-south.lakehouse.cloud.ibm.com/lakehouse/api`
  - **Instance CRN** (e.g., `crn:v1:bluemix:public:lakehouse:us-south:a/...`)
  - **IAM API Key** with access to watsonx.data instance, catalog and engines

### 2. Installation

#### Option 1: Using pip / pipx

```bash
pipx install ibm-watsonxdata-mcp-server
```

If pipx is not installed, you can install the MCP server using pip:

```bash
pip install --user ibm-watsonxdata-mcp-server
```

#### Option 2: Development Setup

```bash
# Clone repository
git clone https://github.com/IBM/ibm-watsonxdata-mcp-server.git
cd ibm-watsonxdata-mcp-server

# Install dependencies
uv sync

# Copy example configuration
cp examples/.env.example .env

# Edit with your credentials
export WATSONX_DATA_BASE_URL=https://us-south.lakehouse.cloud.ibm.com/lakehouse/api
export WATSONX_DATA_API_KEY=your_ibm_cloud_api_key_here
export WATSONX_DATA_INSTANCE_ID=crn:v1:bluemix:public:lakehouse:us-south:a/...

# Verify installation
uv run ibm-watsonxdata-mcp-server --transport stdio
```

### 3. Configure your AI Assistants

#### Integration with Claude Desktop

Find your Claude Desktop configuration file:

- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**: `~/.config/Claude/claude_desktop_config.json`

Add this to `claude_desktop_config.json`:

**Option 1: Using pip/pipx install**

First, find the full path to the installed command:
```bash
# macOS/Linux
which ibm-watsonxdata-mcp-server

# Windows (PowerShell)
where.exe ibm-watsonxdata-mcp-server
```

Common installation paths:
- **macOS/Linux with pipx or pip --user**: `~/.local/bin/ibm-watsonxdata-mcp-server`
- **Windows with pipx**: `%USERPROFILE%\.local\bin\ibm-watsonxdata-mcp-server.exe`
- **System-wide install**: `/usr/local/bin/ibm-watsonxdata-mcp-server`

Then use the full path in your config:
```json
{
  "mcpServers": {
    "IBM watsonx.data MCP Server": {
      "command": "/path/from/which/command/ibm-watsonxdata-mcp-server",
      "args": ["--transport", "stdio"],
      "env": {
        "WATSONX_DATA_BASE_URL": "https://us-south.lakehouse.cloud.ibm.com/lakehouse/api",
        "WATSONX_DATA_API_KEY": "your_api_key_here",
        "WATSONX_DATA_INSTANCE_ID": "crn:v1:bluemix:public:lakehouse:us-south:a/..."
      }
    }
  }
}
```

**Option 2: Using development installation**
```json
{
  "mcpServers": {
    "IBM watsonx.data MCP Server": {
      "command": "/absolute/path/to/uv",
      "args": [
        "--directory",
        "/absolute/path/to/mcp-watsonx-data",
        "run",
        "ibm-watsonxdata-mcp-server"
      ],
      "env": {
        "WATSONX_DATA_BASE_URL": "https://us-south.lakehouse.cloud.ibm.com/lakehouse/api",
        "WATSONX_DATA_API_KEY": "your_api_key_here",
        "WATSONX_DATA_INSTANCE_ID": "crn:v1:bluemix:public:lakehouse:us-south:a/..."
      }
    }
  }
}
```

#### Integration with IBM Bob

Find your mcp_settings.json configuration file:
`~/Library/Application Support/IBM Bob/User/globalStorage/ibm.bob-code/settings/mcp_settings.json`

Different version will have different path. The exact path will be found in `Views and More Actions...` -> `MCP Servers` -> `Edit MCP`

**Option 1: Using pip/pipx install**

First, find the full path:
```bash
which ibm-watsonxdata-mcp-server
```

Then use that path in your config:
```json
{
  "mcpServers": {
    "IBM watsonx.data MCP Server": {
      "command": "/path/from/which/command/ibm-watsonxdata-mcp-server",
      "args": ["--transport", "stdio"],
      "env": {
        "WATSONX_DATA_BASE_URL": "https://us-south.lakehouse.cloud.ibm.com/lakehouse/api",
        "WATSONX_DATA_API_KEY": "your_api_key_here",
        "WATSONX_DATA_INSTANCE_ID": "crn:v1:bluemix:public:lakehouse:us-south:a/..."
      }
    }
  }
}
```

**Option 2: Using development installation**
```json
{
  "mcpServers": {
    "IBM watsonx.data MCP Server": {
      "command": "/absolute/path/to/uv",
      "args": [
        "--directory",
        "/absolute/path/to/mcp-watsonx-data",
        "run",
        "ibm-watsonxdata-mcp-server"
      ],
      "env": {
        "WATSONX_DATA_BASE_URL": "https://us-south.lakehouse.cloud.ibm.com/lakehouse/api",
        "WATSONX_DATA_API_KEY": "your_api_key_here",
        "WATSONX_DATA_INSTANCE_ID": "crn:v1:bluemix:public:lakehouse:us-south:a/..."
      }
    }
  }
}
```

## Available Tools
### Quick Reference

| Goal | Tool to Use | Example |
|------|-------------|---------|
| Check connectivity | `get_instance_details` | "What's my instance status?" |
| Find engines | `list_engines` | "List all engines" |
| Discover data | `list_schemas` | "What schemas exist?" |
| Find tables | `list_tables` | "Show tables in sales_db" |
| Understand schema | `describe_table` | "Describe customers table" |
| Query data | `execute_select` | "SELECT * FROM tpch.tiny.customer LIMIT 10" |


See [TOOLS.md](TOOLS.md) for currently supported tools and advanced usage patterns.

### Example Usage

Once configured with Claude Desktop, you can interact with your watsonx.data instance using natural language:

#### 1. Check Instance Status

**You ask:**
```
What watsonx.data instance am I connected to?
```

**Claude responds with:**
```
You're connected to watsonx.data instance:
- Instance ID: crn:v1:bluemix:public:lakehouse:us-south:a/abc123...
- Region: us-south
- Version: 2.0.0
- Status: active
- Serverless Spark: enabled
- Presto: enabled
```

#### 2. Discover Available Data

**You ask:**
```
What schemas are available in tpch catalog?
```

**Claude responds:**
```
Available Schemas:
- `tiny` - TPC-H benchmark data (smallest scale)
- `sf1` through `sf100000` - Various scale factors (1, 10, 100, 300, 1000, 3000, 10000, 30000, 100000)
- `information_schema` - System schema

**Tables in the "tiny" schema:**
1. customer
2. lineitem
3. nation
4. orders
5. part
6. partsupp
7. region
8. supplier

These are the standard TPC-H benchmark tables. Would you like me to:
- Describe the schema of one of these tables?
- Run a sample query on one of the tables?
- Explore tables from a different scale factor (sf1, sf100, etc.)?
```

#### 3. Explore Table Structure

**You ask:**
```
Show me the schema for the customer table in tiny schema
```

**Claude responds:**
```
Here's the schema for the **customer** table in the **tpch.tiny** schema:

Columns (8 total):

1. `c_custkey` - bigint (nullable)
2. `c_name` - varchar(25) (nullable)
3. `c_address` - varchar(40) (nullable)
4. `c_nationkey` - bigint (nullable)
5. `c_phone` - varchar(15) (nullable)
6. `c_acctbal` - double (nullable)
7. `c_mktsegment` - varchar(10) (nullable)
8. `c_comment` - varchar(117) (nullable)

Table Properties:
- Type: TABLE
- Primary Keys: None
- Partitions: None

This is a standard TPC-H customer table with customer ID, name, address, nation reference, phone, account balance, market segment, and comments. All columns are nullable.

Would you like me to run a query on this table or explore other tables?
```

#### 4. Query Data

**You ask:**
```
Show me the top 10 customers by account balance
```

**Claude executes:**
```sql
SELECT c_custkey, c_name, c_address, c_phone, c_acctbal, c_mktsegment FROM customer ORDER BY c_acctbal DESC LIMIT 10
```

**And presents results in a formatted table**
Here are the top 10 customers by account balance:

| Customer ID | Customer Name | Account Balance |
|---|---|---|
| 213 | Customer#000000213 | $9,987.71 |
| 45 | Customer#000000045 | $9,983.38 |
| 1106 | Customer#000001106 | $9,977.62 |
| 200 | Customer#000000200 | $9,967.60 |
| 140 | Customer#000000140 | $9,963.15 |
| 381 | Customer#000000381 | $9,931.71 |
| 43 | Customer#000000043 | $9,904.28 |
| 100 | Customer#000000100 | $9,889.89 |
| 780 | Customer#000000780 | $9,874.12 |
| 518 | Customer#000000518 | $9,871.66 |

The query executed in 42.7 seconds and returned 10 rows. Customer#000000213 has the highest account balance at $9,987.71.

#### 5. Multi-Step Analysis

**You ask:**
```
I need to analyze customer data. What's available and what does it look like?
```

**Claude:**
1. Let me check what's in your watsonx.data instance
2. I'll explore the tpch catalog
3. Here's the customer table schema
4. Runs a query to get customer data
5. Runs few more queries based on the columns in the customer table
6. Looks at the related tables
7. Presents a summary 

See [TOOLS.md](TOOLS.md) for more usage patterns and advanced examples.

## Development

### Setting Up Development Environment

1. **Clone the repository**
   ```bash
   git clone https://github.com/your-org/mcp-watsonx-data.git
   cd mcp-watsonx-data
   ```

2. **Install dependencies (including dev dependencies)**
   ```bash
   uv sync --extra dev
   ```

3. **Set up environment variables**
   ```bash
   cp examples/.env.example .env
   # Edit .env with your credentials
   export WATSONX_DATA_BASE_URL=https://us-south.lakehouse.cloud.ibm.com/lakehouse/api
   export WATSONX_DATA_API_KEY=your_ibm_cloud_api_key_here
   export WATSONX_DATA_INSTANCE_ID=crn:v1:bluemix:public:lakehouse:us-south:a/...
   ```

### Running Tests

Run the full test suite with coverage:
```bash
uv run pytest
```

Run tests with verbose output:
```bash
uv run pytest -v
```

Run specific test file:
```bash
uv run pytest tests/test_client.py
```

Run tests with coverage report:
```bash
uv run pytest --cov=lakehouse_mcp --cov-report=html
```

View coverage report:
```bash
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux
start htmlcov/index.html  # Windows
```

### Code Quality

Run linting and formatting:
```bash
uv run ruff check .
uv run ruff format .
```

Run type checking:
```bash
uv run mypy src/
```

Run pre-commit hooks:
```bash
uv run pre-commit run --all-files
```

## Troubleshooting
See [TROUBLESHOOTING.md](TROUBLESHOOTING.md) | for common issues, diagnostics, and solutions

## Useful Links

- **IBM watsonx.data Docs**: https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started
- **IBM Cloud API Keys**: https://cloud.ibm.com/iam/apikeys
- **MCP Specification**: https://modelcontextprotocol.io/
