Metadata-Version: 2.4
Name: ch-analytics-mcp
Version: 0.2.0
Summary: MCP server for ClickHouse analytics — schema discovery, data exploration, storage & engines, performance, data quality, multi-env support
Project-URL: Homepage, https://github.com/fabdendev/ch-analytics-mcp
Project-URL: Repository, https://github.com/fabdendev/ch-analytics-mcp
Project-URL: Issues, https://github.com/fabdendev/ch-analytics-mcp/issues
License: MIT
License-File: LICENSE
Keywords: analytics,clickhouse,data-quality,mcp,monitoring
Requires-Python: >=3.12
Requires-Dist: clickhouse-connect>=0.8
Requires-Dist: fastmcp>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=9.0; extra == 'dev'
Requires-Dist: ruff>=0.14; extra == 'dev'
Description-Content-Type: text/markdown

# ch-analytics-mcp

An [MCP](https://modelcontextprotocol.io/) server for **ClickHouse analytics** — a general-purpose "DBA-lite" toolkit that gives any MCP client (Claude Code, Cursor, etc.) instant visibility into schema structure, data quality, storage engines, performance, multi-environment comparison, and data modeling.

## What it does

Exposes **30 read-only tools** organised in 6 categories:

### Schema Discovery
- **`database_summary`** — high-level overview: database/table/view counts, engine distribution, compression stats
- **`scan_databases`** — row counts for every table (instant from metadata, no COUNT(*))
- **`describe_table`** — column details: name, type, nullable, default, compression codec, comment
- **`table_sizes`** — disk usage (compressed + uncompressed) per table with compression ratio
- **`find_tables`** — search tables by name pattern (ILIKE)
- **`find_columns`** — find tables that have a column matching a pattern
- **`list_empty_tables`** — quickly find tables with 0 rows
- **`list_environments`** — list configured environments

### Data Exploration
- **`recent_rows`** — peek at the most recent rows (auto-detects timestamp or sorting key)
- **`column_value_counts`** — distinct values and frequencies for a column
- **`column_stats`** — min, max, avg, null count, distinct count for a column

### Storage & Engines
- **`table_engines`** — list tables by engine (MergeTree, Distributed, etc.) with sorting/partition/primary keys
- **`partition_info`** — partition details: part count, rows, compressed/uncompressed size, date range
- **`compare_envs`** — compare row counts across DEV / STG / PROD

### Performance
- **`data_skipping_indices`** — list data skipping indices (minmax, set, bloom_filter, etc.)
- **`slow_query_candidates`** — find recent slow queries from system.query_log
- **`parts_analysis`** — analyse part counts and compression (merge pressure detection)

### Data Quality
- **`table_health`** — row count + last inserted_at/updated_at for a table
- **`null_report`** — null percentage for every column in a table
- **`duplicate_check`** — find duplicate rows based on a set of columns

### Data Modeling & Discovery
- **`infer_relationships`** — scan `*_id` columns and infer foreign-key relationships by matching table names
- **`column_overlap`** — find column names appearing in multiple tables (join keys, shared dimensions)
- **`data_profile`** — one-shot full profile per column: type, cardinality, null%, uniqueness, min/max, samples
- **`sample_values`** — distinct values with frequency, data type, avg length, and uniqueness flag
- **`cardinality_check`** — determine relationship type (1:1, 1:N, N:1, N:M) between two tables via a join column
- **`key_candidates`** — identify potential primary/natural key columns based on uniqueness and null rate
- **`naming_conventions`** — analyze column naming patterns (suffixes like `_id`, `_at`, `_type`, etc.)
- **`type_census`** — distribution of base data types across all visible tables
- **`temporal_columns`** — find all Date/DateTime columns with their time ranges
- **`table_activity`** — show table activity based on part modification times (active vs stale)

## Quick start

### Prerequisites

- Python 3.12+
- [uv](https://docs.astral.sh/uv/) (recommended) or pip
- One or more ClickHouse instances

### Install

**Option A — run directly with uvx (no clone needed):**

```bash
uvx ch-analytics-mcp
```

**Option B — clone and run:**

```bash
git clone https://github.com/fabdendev/ch-analytics-mcp.git
cd ch-analytics-mcp
uv sync
```

### Configure

Set environment variables for each ClickHouse environment (at least one is required):

| Variable | Description | Required |
|----------|-------------|----------|
| `CH_LOCAL_URL` | ClickHouse URL for LOCAL | At least one URL |
| `CH_DEV_URL` | ClickHouse URL for DEV | At least one URL |
| `CH_STG_URL` | ClickHouse URL for STG | Optional |
| `CH_PROD_URL` | ClickHouse URL for PROD | Optional |
| `CH_INCLUDE_DATABASES` | Comma-separated allowlist of databases to scan | Optional |
| `CH_IGNORE_DATABASES` | Comma-separated databases to skip (added to internal exclusions) | Optional |
| `CH_READ_ONLY` | Reserved for future write tools (not yet used) | Optional |

```bash
export CH_DEV_URL="clickhouse://user:pass@host:8123/default"
export CH_STG_URL="clickhouse://user:pass@host:8123/default"   # optional
export CH_PROD_URL="clickhouse://user:pass@host:8123/default"  # optional

# Database filtering (optional — pick one, not both)
export CH_INCLUDE_DATABASES="core,trading,pipeline"  # only scan these
export CH_IGNORE_DATABASES="tmp,scratch"              # skip these
```

URL format: `clickhouse://user:pass@host:port/database` (HTTP interface, port 8123).
For HTTPS: `https://user:pass@host:port/database` (port 8443).

If both `CH_INCLUDE_DATABASES` and `CH_IGNORE_DATABASES` are set, the include list takes precedence.

### Add to Claude Code

Add to your Claude Code MCP settings (`~/.claude/settings.json` or `.mcp.json`):

**If using uvx:**

```json
{
  "mcpServers": {
    "ch-analytics": {
      "command": "uvx",
      "args": ["ch-analytics-mcp"],
      "env": {
        "CH_DEV_URL": "clickhouse://user:pass@host:8123/default",
        "CH_STG_URL": "clickhouse://user:pass@host:8123/default"
      }
    }
  }
}
```

**If installed from clone:**

```json
{
  "mcpServers": {
    "ch-analytics": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/ch-analytics-mcp", "ch-analytics-mcp"],
      "env": {
        "CH_DEV_URL": "clickhouse://user:pass@host:8123/default"
      }
    }
  }
}
```

## Security

All tools are **read-only**. No data is ever modified. Additional safeguards:

- **Identifier validation** — all user-provided database/table/column names are validated against `^[a-zA-Z_][a-zA-Z0-9_]*$` and backtick-quoted
- **Row limits** — row-level queries capped at 100, aggregation queries at 200
- **Query timeout** — potentially expensive queries (null_report, column_stats, duplicate_check, column_value_counts, data_profile, sample_values, cardinality_check, key_candidates, temporal_columns) use a 30s `max_execution_time`
- **Direction validation** — order_dir restricted to ASC/DESC only

## Multi-environment support

Configure up to 4 environments (LOCAL, DEV, STG, PROD). The first configured environment becomes the default. Use `compare_envs` to quickly spot row count differences across environments.

## Development

```bash
uv sync --extra dev
uv run ruff check ch_analytics_mcp/    # lint
uv run python -m ch_analytics_mcp      # start server locally
```

## License

MIT
