Metadata-Version: 2.4
Name: dwcp-redshift-mcp-server
Version: 1.0.0
Summary: MCP server that generates and tunes Redshift SQL from DDL with Ping Federate authentication
License: MIT
Keywords: aws,ddl,mcp,ping-federate,redshift,sql
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Requires-Dist: boto3>=1.34.0
Requires-Dist: botocore>=1.34.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: redshift-connector>=2.1.0
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# DWCP Redshift SQL Generator & Tuner — MCP Server

A **Python MCP server** that lets you share Redshift DDL in a Kiro (or any MCP-compatible) chat session and get back:

- **Generated SQL** — optimised SELECT, aggregation, JOIN, and window-function queries tailored to your schema
- **Tuned SQL** — anti-pattern detection + rewritten queries following Redshift best practices
- **DDL Analysis** — sort key, distribution key, column encoding, and VARCHAR sizing advice
- **Live query execution** — run read-only SQL against Redshift clusters using Ping Federate (PingCredentialsProvider) authentication

---

## Quick Start (Plug & Play)

Once published to PyPI, users can run this server with a single `uvx` command — no git clone required!

### Step 1 — Create your clusters config

Create `~/.dwcp-redshift-mcp-server/clusters.json`:

```json
{
  "common": {
    "idp_host": "sso.fed.prod.aws.swacorp.com",
    "region": "us-east-1",
    "database": "swadb",
    "port": 5439,
    "ssl_insecure": true
  },
  "clusters": {
    "dev1-aina": {
      "host": "<cluster-endpoint>",
      "cluster_identifier": "<cluster-id>",
      "partner_sp_id": "urn:amazon:webservices:<account>:<app>"
    }
  }
}
```

### Step 2 — Add to Kiro MCP config

Create or edit `.kiro/settings/mcp.json`:

```json
{
  "mcpServers": {
    "dwcp-redshift-mcp-server": {
      "command": "uvx",
      "args": ["dwcp-redshift-mcp-server@latest"],
      "env": {
        "REDSHIFT_USER": "your-eid-or-xid",
        "REDSHIFT_PASSWORD": "your-ping-password"
      },
      "disabled": false,
      "autoApprove": [
        "generate_sql_from_ddl",
        "tune_sql",
        "analyse_ddl",
        "list_clusters_config",
        "execute_query_ping"
      ]
    }
  }
}
```

### Step 3 — Restart Kiro

That's it! The server will be downloaded and run automatically.

---

## Architecture

```
Kiro IDE (chat)
      │
      └─── dwcp-redshift-mcp-server  (this project)
                │  uvx dwcp-redshift-mcp-server@latest
                ├─ DDL-driven SQL generation & tuning (offline)
                └─ Live Redshift queries via PingCredentialsProvider
                        │
                        ├─ Ping Federate IdP  (SAML assertion)
                        ├─ AWS STS            (AssumeRoleWithSAML)
                        └─ Redshift cluster   (redshift-connector)
```

### Tools exposed to Kiro

| Tool | Description |
|---|---|
| `generate_sql_from_ddl` | Parse DDL → emit optimised SELECT / aggregation / JOIN / window SQL |
| `tune_sql` | Detect anti-patterns in existing SQL → return improved version |
| `analyse_ddl` | Report on distribution strategy, sort keys, column encoding |
| `list_clusters_config` | List available cluster names from clusters.json |
| `execute_query_ping` | Run read-only SQL on a named cluster via Ping Federate auth |

---

## Configuration

### Cluster Config Location

The server looks for `clusters.json` in this order:
1. `DWCP_CLUSTERS_CONFIG` environment variable (absolute path)
2. `~/.dwcp-redshift-mcp-server/clusters.json` (user home) — **recommended**
3. `./config/clusters.json` (current working directory)

### Cluster Config Fields

| Field | Required | Description |
|---|---|---|
| `host` | yes | Redshift cluster endpoint |
| `cluster_identifier` | yes | Cluster identifier |
| `partner_sp_id` | yes | Ping Federate SP ID for the Redshift app |
| `database` | no | Overrides `common.database` |
| `db_groups` | no | Redshift db groups to join |
| `ssl_insecure` | no | Skip SSL cert verification (default: false) |

### Environment Variables

| Variable | Description |
|---|---|
| `REDSHIFT_USER` | Ping / AD username (EID or XID) |
| `REDSHIFT_PASSWORD` | Ping / AD password |
| `DWCP_CLUSTERS_CONFIG` | Optional: custom path to clusters.json |

---

## Development Setup

For contributors or local development:

### Prerequisites

| Tool | Version | Install |
|---|---|---|
| Python | ≥ 3.10 | `uv python install 3.10` |
| uv | latest | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |

### Install & Run Locally

```bash
# Clone the repo
git clone <your-gitlab-repo-url>
cd dwcp-redshift-mcp-server

# Install dependencies
uv sync

# Run the server directly
uv run dwcp-redshift-mcp-server
```

### Local MCP Config (for development)

```json
{
  "mcpServers": {
    "dwcp-redshift-mcp-server": {
      "command": "uv",
      "args": ["run", "--project", "/path/to/dwcp-redshift-mcp-server", "dwcp-redshift-mcp-server"],
      "env": {
        "REDSHIFT_USER": "your-eid",
        "REDSHIFT_PASSWORD": "your-password"
      }
    }
  }
}
```

---

## Publishing to PyPI

To make this server available via `uvx dwcp-redshift-mcp-server@latest`:

### Option 1: Public PyPI

```bash
# Build the package
uv build

# Upload to PyPI (requires PyPI account)
uv publish
```

### Option 2: Private PyPI / GitLab Package Registry

```bash
# Build
uv build

# Upload to GitLab Package Registry
uv publish --publish-url https://gitlab.example.com/api/v4/projects/<project-id>/packages/pypi
```

Then users configure their pip/uv to use your private registry.

---

## Usage in Kiro Chat

### Generate SQL from DDL

```
Generate optimised SQL for this DDL:

CREATE TABLE aina_db.lby_ksk_fcst (
  fcst_stn_cde       CHAR(3)   ENCODE lzo,
  fcst_actv_dt       DATE      ENCODE az64
)
DISTSTYLE KEY DISTKEY (fcst_actv_dt)
SORTKEY (fcst_stn_cde);
```

### Tune existing SQL

```
Tune this SQL for Redshift:

SELECT * FROM aina_db.lby_ksk_fcst
```

### Analyse DDL

```
Analyse this DDL and tell me how to optimise the schema for Redshift
```

---

## Redshift Best Practices Enforced

### SQL Tuning (anti-patterns detected)

| Anti-pattern | Why it matters |
|---|---|
| `SELECT *` | Kills columnar storage advantage |
| Leading-wildcard LIKE | Prevents zone-map / block pruning |
| Missing LIMIT | Can push millions of rows to client |
| Implicit cross-join | Exponential row explosion across slices |
| OR in WHERE | Disables zone-map pruning on sort key |
| NOT IN (SELECT …) | Null-unsafe, forces nested-loop strategy |
| Scalar subquery in SELECT | Executes once per row |
| ORDER BY on non-sort-key | Forces runtime sort |
| DISTINCT without reason | Usually signals bad join cardinality |

### DDL Analysis

- DISTKEY selection guidance
- SORTKEY recommendations (compound vs interleaved)
- Per-column ENCODE suggestions (AZ64, ZSTD, RAW)
- VARCHAR over-sizing warnings

---

## Security Notes

- `REDSHIFT_USER` and `REDSHIFT_PASSWORD` are stored in `mcp.json` — do not commit this file
- All `execute_query_ping` calls enforce `SET transaction_read_only = ON`
- The `generate_sql_from_ddl`, `tune_sql`, and `analyse_ddl` tools operate entirely offline
