Metadata-Version: 2.4
Name: schema-importer
Version: 0.4.0
Summary: Generic schema and database import tooling
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: python-dotenv
Requires-Dist: mcp

﻿# Schema Import Wizard

Refactored import utility with object-oriented architecture, plugin-style database adapters, and unit tests.

Current status:
- Production-ready flow for PostgreSQL.
- Extension points for MySQL and MongoDB adapters are in place.
- SQL transformation pipeline is modular and testable.

## Architecture

The importer follows a layered OOP design:
- ImportWizard: orchestration (staging, transformations, wrapper SQL, execution).
- DatabaseAdapter: abstract base class for database-specific execution.
- PostgresAdapter: concrete implementation using psql.
- AdapterFactory: creates adapters from TARGET_DB or --target-db.
- SqlTransformer: include-path rewrite, CSV path patching, and COPY warning checks.

## Features

- Cross-platform path handling (Windows/Linux compatible with POSIX output paths).
- Deterministic SQL execution order (SQL_ORDER).
- Isolated staging folder per run (.schema_import_staging/<timestamp>/).
- CSV path normalization for SQL COPY and \copy statements.
- Stronger validation:
  - Missing SQL files fail fast.
  - Missing or ambiguous CSV references fail fast.
  - Non-zero command failures propagate as process exit codes.
- Console plus rotating file logging.
- Dry-run support.

## Configuration

Use .env (see .env.example):

```env
PGHOST=localhost
PGPORT=5432
PGDATABASE=example_db
PGUSER=postgres
PGPASSWORD=your_password

SCHEMA_IMPORT_ROOT=./inputs
SQL_ORDER=schema_setup.sql,insert_postgre_queries.sql,pg_vector.sql

LOG_LEVEL=DEBUG
TARGET_DB=postgres

# Optional: explicit path to psql executable (useful on Windows)
# PSQL_BIN=C:/Program Files/PostgreSQL/16/bin/psql.exe

# Optional: continue import when pgvector binaries are not installed on server
# ALLOW_MISSING_PGVECTOR=true
```

INI profile support:
- All CLI surfaces now support `--config-file` and `--config-section`.
- Resolution order is: explicit CLI args > INI profile values > environment > built-in defaults.
- Import can read engine aliases from INI (for example: `postgresql`, `mongo`, `sql server`, `sqlite3`).
- Cleanup and check connection are PostgreSQL-only commands; non-PostgreSQL profile engines are rejected with a clear error.
- Sample multi-database INI file: `sample_db_profiles.ini`.

## One Command Bootstrap (Cross-Platform)

Use one script on all platforms:

```bash
python bootstrap.py
```

What it does:
1. Creates .venv if missing.
2. Upgrades pip in .venv.
3. Installs dependencies from requirements.txt inside .venv.
4. Runs unit tests using .venv interpreter.

Useful options:

```bash
python bootstrap.py --skip-tests
python bootstrap.py --recreate-venv
python bootstrap.py --skip-install
python bootstrap.py --skip-install --coverage
python bootstrap.py --skip-install --coverage --cov-fail-under 100
```

Coverage mode enforces a hard gate.
- Default gate is `100%`.
- The command fails if coverage is below the configured threshold.
- Coverage is scoped to `schema_import.py` for gate evaluation.

Legacy helper scripts now delegate to `bootstrap.py` to keep one execution path:
- Windows: `001_env.bat`, `003_setup.bat`, `005_run_test.bat`, `005_run_code_cov.bat`
- Unix: `001_env.sh`, `003_setup.sh`, `005_run_test.sh`, `005_run_code_cov.sh`

## Usage

Run importer dry-run:

```bash
python schema_import.py --dry-run
```

Execute import:

```bash
python schema_import.py
```

Select adapter explicitly:

```bash
python schema_import.py --target-db postgres
```

Run import using an INI profile:

```bash
python schema_import.py --config-file import_schema.ini --config-section STAGE_POSTGRES
```

Cleanup local schemas and database (dev box):

Use the cleanup CLI entrypoint that ships with the repository.

The cleanup command is destructive and always requires explicit confirmation.
When prompted, type `YES` to proceed.
For safety, protected databases (`postgres`, `template0`, `template1`) are blocked and cannot be dropped.

Cleanup dry-run:

Dry-run mode lets you preview the actions first.

```bash
python cleanup_schema.py --dry-run
```

Override target database or schema list:

Use this when you need to target a different database or schema set.

```bash
python cleanup_schema.py --target-db myapp_db --schemas myapp,myapp_vector --dry-run
```

Run cleanup using an INI profile:

```bash
python cleanup_schema.py --config-file import_schema.ini --config-section STAGE_POSTGRES --dry-run
```

Wrapper scripts:
- Windows: `006_cleanup_schema.bat`
- Unix: `006_cleanup_schema.sh`

Check connectivity and authorization:

Wrapper scripts:
- Windows: `009_check_connection.bat`
- Unix: `009_check_connection.sh`

Check connection for import authorization:

```bash
python check_connection.py --mode import --target-db myapp_db --schemas myapp,myapp_vector
```

Check connection using an INI profile:

```bash
python check_connection.py --config-file import_schema.ini --config-section STAGE_POSTGRES --mode import
```

Run full INI workflow test using `.env` values (connection + schema details):

Windows:

```powershell
.\010_test_ini_workflow.bat
```

Unix:

```bash
./010_test_ini_workflow.sh
```

Note:
- The sample INI sets `cleanup_target_db=myapp_db` to avoid destructive operations against protected system databases such as `postgres` during cleanup dry-runs.

Optional profile override:

```powershell
.\010_test_ini_workflow.bat POSTGRES_FROM_ENV
```

```bash
./010_test_ini_workflow.sh POSTGRES_FROM_ENV
```

## MCP Server

This project includes an MCP server implementation with exactly these tools:
- `import_schema`
- `cleanup_schema`
- `check_connection`

Start MCP server directly:

```bash
python mcp_server.py
```

Start using installed entrypoint:

```bash
schema-mcp-server
```

Wrapper scripts:
- Windows: `011_run_mcp_server.bat`
- Unix: `011_run_mcp_server.sh`

### NLP Examples (Chat/Copilot)

Use `.env` source examples:

- import_schema:
  - "Use .env and import schema in dry run mode."
  - "Use .env and run import schema now (not dry run)."
- cleanup_schema:
  - "Use .env and cleanup myapp_db with schemas myapp,myapp_vector in dry run."
  - "Use .env and cleanup myapp_db for real; confirm cleanup yes."
- check_connection:
  - "Use .env and check connection for import mode."
  - "Use .env and check connection for cleanup mode with schemas myapp,myapp_vector."

Use `.ini` source examples:

- import_schema:
  - "Import schema using sample_db_profiles.ini section [POSTGRES_FROM_ENV] in dry run."
  - "Use sample_db_profiles.ini profile POSTGRES_FROM_ENV and run import schema."
- cleanup_schema:
  - "Cleanup schema using sample_db_profiles.ini section [POSTGRES_FROM_ENV] in dry run."
  - "Use sample_db_profiles.ini profile POSTGRES_FROM_ENV and cleanup myapp_db."
- check_connection:
  - "Check connection using sample_db_profiles.ini section [POSTGRES_FROM_ENV] for import mode."
  - "Use sample_db_profiles.ini profile POSTGRES_FROM_ENV and check cleanup authorization."

Notes:
- If `.env` or selected INI section has required fields, no follow-up questions are needed.
- Precedence is `explicit values in request > selected INI profile > .env > defaults`.
- For destructive cleanup (`dryRun=false`), explicit confirmation is required.

Check connection for cleanup authorization:

```bash
python check_connection.py --mode cleanup --target-db myapp_db --schemas myapp,myapp_vector
```

Entry-point command (after package install):

```bash
schema-check-connection --mode import --target-db myapp_db --schemas myapp,myapp_vector
```

Wrapper examples:

```powershell
.\009_check_connection.bat --mode import --target-db myapp_db --schemas myapp,myapp_vector
```

```bash
./009_check_connection.sh --mode cleanup --target-db myapp_db --schemas myapp,myapp_vector
```

## PostgreSQL Client Resolution

Real PostgreSQL execution requires `psql`.

- The importer first checks `PSQL_BIN` (if set).
- Otherwise it resolves `psql` from your system `PATH`.
- If not found, it fails with a clear message.
- If target `PGDATABASE` does not exist yet, importer retries once against `postgres` for bootstrap scripts.

pgvector optional fallback:
- If `ALLOW_MISSING_PGVECTOR=true` and server reports `extension "vector" is not available`,
  importer retries once without `pg_vector.sql`.

Dry-run behavior:
- `--dry-run` does not require `psql` to be installed.
- Non-dry runs require a valid `psql` executable.

Examples:

```powershell
$env:PSQL_BIN = 'C:\Program Files\PostgreSQL\16\bin\psql.exe'
.\004_run.bat
```

```bash
export PSQL_BIN="/usr/lib/postgresql/16/bin/psql"
./004_run.sh
```

## Test Suite

Run tests manually:

```bash
python -m unittest discover -s tests -p "test_*.py"
```

## Local Security Reports (HTML)

Generate local audit/security reports and convert them to HTML using sec_report_kit.

Wrapper scripts:
- Windows: `007_run_security_reports.bat`
- Unix: `007_run_security_reports.sh`

Outputs are written under `test_reports/security_reports/`:
- `pip_audit.json`
- `pip_audit.html`
- `trivy_fs.json` (when Trivy is installed)
- `trivy_fs.html` (when Trivy is installed)

Examples:

```powershell
.\007_run_security_reports.bat
```

```bash
./007_run_security_reports.sh
```

Optional flags:

```bash
python security_reports.py --skip-trivy
python security_reports.py --output-dir test_reports/security_reports --target schema_importer
```

## Extending to Other Databases

To support MySQL and MongoDB next:
1. Implement execute(...) in MySqlAdapter and MongoDbAdapter.
2. Add database-specific SQL transformer or converter stage where syntax differs.
3. Add adapter-focused unit tests and integration tests.

For now, PostgreSQL is the supported and validated runtime path.
