# PyClickHouseMigrator

PyClickHouseMigrator is a SQL-first ClickHouse migration runner for Python teams.

It manages migration files, applies pending migrations, stores migration state in ClickHouse, validates checksums, supports rollback SQL, provides dry-run output, supports baseline adoption for existing databases, uses an advisory migration lock, and can create its own service tables in cluster-aware mode.

It is intentionally not an ORM, not a schema diff engine, not a ClickHouse DDL advisor, and not a deployment platform.

## Package and runtime

- PyPI package: `py-clickhouse-migrator`
- CLI entry point: `migrator`
- Python: 3.11+
- Runtime dependencies: `click`, `clickhouse-driver`
- License: MIT
- Default migrations directory: `./db/migrations`
- Docker default migrations directory: `/migrations`
- Repository: https://github.com/Maksim-Burtsev/PyClickHouseMigrator

## Product scope

PyClickHouseMigrator is a migration runner. Its responsibilities are:

1. discover local `.sql` migration files;
2. parse `-- migrator:up`, `-- migrator:down`, and `-- @stmt` blocks;
3. apply pending `up` statements in filename order;
4. store applied migration metadata in `db_migrations`;
5. rollback applied migrations in reverse order using stored `down` SQL;
6. validate checksums for applied migrations;
7. show migration status and integrity warnings;
8. baseline existing databases;
9. protect common concurrent execution cases with an advisory lock;
10. support service table creation in ClickHouse cluster mode.

It does not:

- generate migrations from schema diffs;
- inspect ClickHouse and infer desired schema;
- generate rollback SQL;
- rewrite SQL;
- inject `ON CLUSTER` into user migrations;
- decide whether DDL is operationally safe;
- create the target database;
- sandbox migration SQL;
- provide a web UI, approvals, RBAC, or rollout orchestration.

Migration files are trusted input.

## Migration file format

Migrations are `.sql` files.

Basic example:

```sql
-- migrator:up
-- @stmt
CREATE TABLE IF NOT EXISTS users
(
    id UInt64,
    name String,
    created_at DateTime DEFAULT now()
)
ENGINE = MergeTree
ORDER BY id

-- migrator:down
-- @stmt
DROP TABLE IF EXISTS users
```

A valid migration must contain exactly one `-- migrator:up` marker and exactly one `-- migrator:down` marker. The `up` marker must appear before the `down` marker. SQL content must be inside `-- @stmt` blocks. The `up` section must contain at least one non-empty statement block. The `down` section may be empty.

Each `-- @stmt` block is executed as one ClickHouse query. The migrator does not split SQL by semicolons.

Multiple statements are expressed by multiple blocks:

```sql
-- migrator:up
-- @stmt
CREATE TABLE IF NOT EXISTS events
(
    id UInt64,
    event_name String
)
ENGINE = MergeTree
ORDER BY id

-- @stmt
ALTER TABLE events ADD COLUMN IF NOT EXISTS created_at DateTime DEFAULT now()

-- migrator:down
-- @stmt
ALTER TABLE events DROP COLUMN IF EXISTS created_at

-- @stmt
DROP TABLE IF EXISTS events
```

Files are executed in lexicographic filename order. `migrator new` generates timestamped filenames such as:

```text
20260421140000_create_users_table.sql
```

Only `.sql` files are discovered.

## CLI commands

### Global options

| Option | Env var | Default | Meaning |
|---|---|---:|---|
| `--url` | `CLICKHOUSE_MIGRATE_URL` | — | ClickHouse connection URL; required for DB commands. |
| `--path` | `CLICKHOUSE_MIGRATE_DIR` | `./db/migrations` | Migration files directory. |
| `--cluster` | `CLICKHOUSE_MIGRATE_CLUSTER` | — | Cluster name for migrator service table DDL. |
| `--connect-retries` | `CLICKHOUSE_MIGRATE_CONNECT_RETRIES` | `0` | Startup connection retry attempts. |
| `--connect-retries-interval` | `CLICKHOUSE_MIGRATE_CONNECT_RETRIES_INTERVAL` | `1` | Seconds between startup retries. |
| `--send-receive-timeout` | `CLICKHOUSE_MIGRATE_SEND_RECEIVE_TIMEOUT` | `600` | ClickHouse client send/receive timeout. |
| `-v`, `--verbose` | — | off | DEBUG logging. |
| `-q`, `--quiet` | — | off | Suppress INFO/WARNING logs; command output such as dry-run SQL is still printed. |

Connection URL example:

```text
clickhouse://user:password@host:9000/database
```

Secure URL example:

```text
clickhouse://user:password@host:9440/database?secure=True
```

### `init`

Creates the migrations directory. Works offline.

```sh
migrator init
migrator --path ./migrations init
```

### `new`

Creates a timestamped `.sql` migration file. Works offline.

```sh
migrator new create_users_table
```

Name suffix rules: letters, digits, underscores. The suffix is optional but recommended.

Generated template:

```sql
-- migrator:up
-- @stmt


-- migrator:down
-- @stmt
```

### `up`

Applies pending migrations in filename order.

```sh
migrator up
migrator up 3
migrator up --dry-run
migrator up --no-validate
migrator up --allow-dirty
migrator up --no-lock
```

Options:

- optional positional `N`: number of pending migrations to apply;
- `--lock / --no-lock`, default `--lock`;
- `--lock-ttl`, default `600` seconds;
- `--lock-retry`, default `3` attempts;
- `--dry-run`: print SQL without executing;
- `--validate / --no-validate`, default `--validate`;
- `--allow-dirty`: skip checksum mismatch failure for this run.

`up` checks applied migration checksums before applying pending migrations. Dry-run does not write migration state.

### `rollback`

Rolls back applied migrations in reverse order.

```sh
migrator rollback
migrator rollback 3
migrator rollback --dry-run
migrator rollback --no-validate
```

Options:

- optional positional `N`, default `1`;
- `--lock / --no-lock`, default `--lock`;
- `--lock-ttl`, default `600` seconds;
- `--lock-retry`, default `3` attempts;
- `--dry-run`;
- `--validate / --no-validate`, default `--validate`.

Rollback uses the stored `rollback` SQL from `db_migrations`. It selects rows where `kind = 'migration'`, so baseline rows are not rolled back.

### `show`

Displays migration status.

```sh
migrator show
migrator show --all
```

Shows applied migrations, pending migrations, total counts, HEAD marker, baseline marker, and integrity warnings for modified or missing applied files.

### `baseline`

Adopts an existing DB by marking current `.sql` migration files as already applied without executing SQL.

```sh
migrator baseline
```

Baseline creates `db_migrations` if needed, but the table must have no rows. It writes rows with `kind = 'baseline'`, empty `up` and `rollback` columns, and empty checksums. Baseline rows are excluded from rollback and checksum validation.

Baseline does not validate SQL and does not compare files with the existing ClickHouse schema.

### `repair`

Accepts intentional edit(s) to applied migration file(s) by updating stored checksums.

```sh
migrator repair
```

Use only after confirming that the database state is still consistent with the edited file(s). It does not execute migration SQL. Missing files are reported and skipped.

### `lock-info`

Shows active lock holder and timestamps.

```sh
migrator lock-info
```

### `force-unlock`

Manually releases the lock by inserting an unlock row.

```sh
migrator force-unlock
```

Use after confirming the previous migration runner is no longer active or when the lock should be cleared manually.

## State tables

### `db_migrations`

The migration ledger is created automatically in the target database.

Single-node engine:

```sql
MergeTree()
```

Cluster-mode engine:

```sql
ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
```

Columns:

- `name String` — migration filename;
- `kind Enum8('migration' = 1, 'baseline' = 2)` — normal migration or baseline row;
- `up String` — stored applied SQL section;
- `rollback String` — stored rollback SQL section;
- `dt DateTime64 DEFAULT now()` — applied/recorded timestamp;
- `checksum String DEFAULT ''` — checksum for normal migrations.

Normal applied migrations store the `up`, `rollback`, and checksum values. Baseline rows store empty SQL and empty checksum.

### `_migrations_lock`

The advisory lock table is created automatically by `MigrationLock`.

Single-node engine:

```sql
ReplacingMergeTree(locked_at)
```

Cluster-mode engine:

```sql
ReplicatedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}', locked_at)
```

Columns:

- `lock_id String DEFAULT 'migration'`;
- `locked_by String`;
- `locked_at DateTime64(3) DEFAULT now64(3)`;
- `expires_at DateTime64(3)`;
- `is_locked UInt8 DEFAULT 1`.

Lock ownership is verified after insert by reading the latest active row with `FINAL`. Expired locks are ignored.

## Checksum behavior

Checksums are computed from the `up` and `down` statement blocks extracted by the migrator.

Normalization strips trailing whitespace, removes blank lines, joins lines with `\n`, serializes statements with null-byte separators, and computes SHA-256.

Checksum validation runs on `migrator up` and in `migrator show`. Baseline rows are excluded.

If mismatches exist:

- `up` fails by default;
- `up --allow-dirty` logs a warning and continues;
- `show` displays modified/missing markers;
- `repair` updates stored checksums for existing files.

## Preflight validation

`up` and `rollback` validate statements with `EXPLAIN AST` by default.

Validation applies to dry-run and execution flows. It can be disabled with:

```sh
migrator up --no-validate
migrator rollback --no-validate
```

Validation is best-effort. It is not a production-safety analyzer and does not guarantee execution success.

## Cluster mode

When `--cluster` or `CLICKHOUSE_MIGRATE_CLUSTER` is set:

- `db_migrations` is created with `ON CLUSTER <cluster>` and a replicated engine;
- `_migrations_lock` is created with `ON CLUSTER <cluster>` and a replicated replacing engine;
- service table operations use `insert_quorum = auto` and `select_sequential_consistency = 1`;
- user migration SQL is executed exactly as written.

The migrator does not inject `ON CLUSTER` into user migrations.

If DDL should run on the whole ClickHouse cluster, the migration file must include `ON CLUSTER` explicitly.

Cluster names must match:

```text
[a-zA-Z_][a-zA-Z0-9_]*
```

## Docker

Docker image:

```text
maksimburtsev/py-clickhouse-migrator:latest
```

Run migrations:

```sh
docker run --rm \
  -v "$PWD/db/migrations:/migrations" \
  -e CLICKHOUSE_MIGRATE_URL=clickhouse://default@clickhouse:9000/mydb \
  maksimburtsev/py-clickhouse-migrator:latest \
  up
```

The image sets `CLICKHOUSE_MIGRATE_DIR=/migrations`.

## CI/CD

Recommended CI/CD command:

```sh
migrator up
```

Exit code contract:

- `0`: success;
- `1`: handled migration error.

Run migrations from one deployment job per environment. The advisory lock reduces accidental concurrent runs but should not replace a clean deployment topology.

## Python API

Basic usage:

```python
from py_clickhouse_migrator import Migrator

migrator = Migrator(
    database_url="clickhouse://default@localhost:9000/mydb",
    migrations_dir="./db/migrations",
)

migrator.up()
```

Common methods:

- `up(n=None, dry_run=False, allow_dirty=False, validate=True)`;
- `rollback(number=1, dry_run=False, validate=True)`;
- `show_migrations(show_all=False)`;
- `baseline()`;
- `validate_checksums()`;
- `repair()`;
- `get_unapplied_migration_names()`;
- `get_migrations_for_apply()`;
- `get_migrations_for_rollback()`.

Public exports include `Migrator`, `MigrationLock`, lock errors, migration errors, checksum helpers, and migration file helpers.

## Known limitations

- No DDL transactions. Multi-statement migrations can partially apply.
- One `-- @stmt` block equals one ClickHouse query.
- SQL is trusted input and executed as written.
- No schema diff generation.
- No rollback generation.
- Baseline does not validate existing schema.
- Target database must exist.
- Advisory lock is best-effort.
- Cluster mode does not rewrite user SQL.
- Preflight validation is best-effort.
- No deployment orchestration.

## Source map

- `py_clickhouse_migrator/cli.py` — Click-based CLI commands.
- `py_clickhouse_migrator/migrator.py` — core migration logic, state table, baseline, checksum validation, status output.
- `py_clickhouse_migrator/migration_parser.py` — SQL migration parser for `-- migrator:up`, `-- migrator:down`, and `-- @stmt` blocks.
- `py_clickhouse_migrator/checksum.py` — checksum normalization and SHA-256 computation.
- `py_clickhouse_migrator/lock.py` — advisory lock implementation.
- `py_clickhouse_migrator/errors.py` — custom exception classes.
- `README.md` — main documentation.
- `docs/*` — detailed guides.
- `CHANGELOG.txt` — release history.
