Metadata-Version: 2.4
Name: snowflake-data-migration-orchestrator
Version: 0.3.0
Summary: Data migration orchestrator for Snowflake
Project-URL: Bug Tracker, https://github.com/snowflake-eng/migrations-data-validation/issues
Project-URL: Source code, https://github.com/snowflake-eng/migrations-data-validation/
Project-URL: homepage, https://www.snowflake.com/
Author-email: "Snowflake, Inc." <snowflake-python-libraries-dl@snowflake.com>
License: Snowflake Conversion Software Terms
Keywords: Snowflake,cloud,data,database,migration,orchestrator
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Python: >=3.11
Requires-Dist: jinja2>=3.1.0
Requires-Dist: pandas>=2.3.3
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: snowflake-connector-python>=4.0.0
Requires-Dist: snowflake-data-validation
Requires-Dist: urllib3>=2.6.0
Provides-Extra: development
Requires-Dist: parameterized>=0.9.0; extra == 'development'
Requires-Dist: pytest-cov>=4.0.0; extra == 'development'
Requires-Dist: pytest-mock>=3.10.0; extra == 'development'
Requires-Dist: pytest>=7.0.0; extra == 'development'
Requires-Dist: ruff>=0.1.0; extra == 'development'
Description-Content-Type: text/markdown

# Snowflake Data Migration Orchestrator

[![Python](https://img.shields.io/badge/python-3.11+-blue)](https://www.python.org/downloads/)

The Cloud Data Migration feature of SnowConvert provides a fault-tolerant, scalable solution for moving data from external sources into **Snowflake**. This tool is specifically designed for cases where a user is moving data from a system they plan to decommission. For replication purposes, other solutions are available that might better fit your use case.

## Architecture

- 1 **Orchestrator** is connected to the **Snowflake Account**.
  - It requires privileges to create/operate the **SNOWCONVERT_AI** database, in which metadata is stored.
- 1 or more **Workers** connect to the **Source System** and to the **Snowflake Account**.
  - **Workers** read data from the **Source System** and upload it to a **Snowflake Stage**.
  - **Workers** pick up tasks created by the **Orchestrator** and process them in parallel.
- Files uploaded to the Snowflake Stage are copied into the **Target Tables** using a [COPY INTO](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table) statement.
  - The [COPY INTO](https://docs.snowflake.com/en/sql-reference/sql/copy-into-table) statement is submitted and monitored by the **Orchestrator**.

### Where to Deploy Orchestrator and Worker(s)?

The **Orchestrator** and **Worker(s)** can be deployed in multiple ways:

1. Both on [Snowpark Container Services](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview) (in the **Snowflake Account**).
2. Both on the **Customer's Environment** (custom hardware, virtual machines, containers, etc.).
3. **Orchestrator** on [Snowpark Container Services](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview) and **Worker(s)** on the **Customer's Environment** (or the other way around).

Requirements for the environment:

- The **Orchestrator** and **Worker(s)** are **Python packages**, so Python must be installed.
- The **Worker(s)** will typically require an **ODBC** driver to connect to the **Source System**.
- The **Orchestrator** needs to be able to connect to the **Snowflake Account**. The connection used must have privileges to create the **SNOWCONVERT_AI** database and create schemas/objects on that database.

## Setup

### Additional Configuration on Snowflake Account

When starting the Orchestrator, it will automatically try to set up resources in your Snowflake Account in the **SNOWCONVERT_AI** database (if it does not exist yet, it will be created). This is a one-time step and transparent to the user. Some considerations:

- The **Orchestrator** should connect with a role that has privileges to create the **SNOWCONVERT_AI** database and its objects.
- Whenever the **Orchestrator** starts, it should use a role that allows it to interact with the **SNOWCONVERT_AI** database and its resources. The easiest way to guarantee this is to always run it with the same role that was used for creating **SNOWCONVERT_AI** in the first place.

## Usage

In general, for migrating data using this solution, you will need to:

1. Start the **Orchestrator**.
2. Start the **Worker(s)**.
3. Create a **Data Migration Workflow**.
4. Monitor the **Data Migration Workflow** (asynchronously) until completion.

A **Data Migration Workflow** is essentially an action/goal for the system to complete, such as migrating a specific set of tables with a given configuration. You can submit multiple workflows simultaneously and monitor them. The Orchestrator breaks Data Migration Workflows into smaller tasks. Normally, this also involves splitting a table into partitions before extracting its data and loading it to **Snowflake**.

### Starting the Orchestrator

After installation, start the Orchestrator by running:

```shell
python -m data_migration_orchestrator
```

Before running, make sure that the **SNOWFLAKE_CONNECTION_NAME** environment variable is set to a value that matches one of the connection names in your Snowflake [config.toml](https://docs.snowflake.com/en/developer-guide/snowflake-cli/connecting/configure-connections) or [connections.toml](https://docs.snowflake.com/en/developer-guide/snowflake-cli/connecting/configure-connections). That is the name of the connection used to connect to the **Target Snowflake Account**.

**Connection** session defaults use `SNOWFLAKE_DATABASE` and `SNOWFLAKE_SCHEMA` where your environment provides them (for example in SPCS). Those are separate from **metadata object locations**: by default, workflow and task-queue objects live in `SNOWCONVERT_AI.DATA_MIGRATION` and data-validation objects in `SNOWCONVERT_AI.DATA_VALIDATION`. To deploy metadata under different database or schema names, set:

- `CUSTOM_SNOWFLAKE_DATABASE_FOR_METADATA` (default `SNOWCONVERT_AI`)
- `CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_MIGRATION_METADATA` (default `DATA_MIGRATION`)
- `CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_VALIDATION_METADATA` (default `DATA_VALIDATION`)

Workers that call task-queue stored procedures must use the same `CUSTOM_*` values as the orchestrator.

The **Orchestrator** will run until you stop it. **Data Migration Workflows** need an active **Orchestrator** to be completed. However, the **Orchestrator** can be safely stopped at any point and resumed later (ongoing **Data Migration Workflows** will be resumed at that point).

### Starting the Worker(s)

After installation (`pip install snowflake-data-exchange-agent`), start a Worker by running:

```shell
python -m data_exchange_agent -c <configuration-file-path>
```

The path to the configuration file can be omitted. In that case, the worker will look for a file called **configuration.toml** in your current directory. See the [Worker Configuration](#worker-configuration) section below for the full specification.

**Workers** will run until you stop them. **Data Migration Workflows** need at least one active **Worker** to be completed. However, the **Workers** can be safely stopped at any point and resumed later (ongoing **Data Migration Workflows** will be resumed at that point).

### Creating a Data Migration Workflow

After installation, create workflows by running:

```shell
python -m data_migration_orchestrator create-data-migration-workflow <workflow-config-file-path> --name <workflow-name> --connection-name <connection-name> --source-platform <source-platform>
```

- The **Workflow Configuration** specification can be found in the [Workflow Configuration Reference](#workflow-configuration-reference) section.
- The workflow name must be composed of alphanumerical characters and cannot start with a digit.
- You must specify the name of the **Snowflake** connection you want to use, as it appears in your **config.toml** or **connections.toml** file.
- Supported source platforms are `sqlserver` and `redshift`.

### Monitoring a Data Migration Workflow

Each **Workflow** will go through different stages through its lifecycle:

1. **Pending**: No tasks have been created for this workflow yet.
2. **Executing**: Tasks have been created for this workflow and there are still tasks that haven't reached a terminal state (COMPLETED or FAILED).
3. **Completed:** All tasks have reached a terminal state (COMPLETED or FAILED).

In the **data migration metadata schema** (by default **SNOWCONVERT_AI.DATA_MIGRATION**) there are tables/views that can be queried to understand the status of one or more Workflows:

| View/Table | Description |
|------------|-------------|
| **WORKFLOW** | One row per workflow. Includes start/end time, status, and configuration. |
| **TABLE_PROGRESS_WITH_EXAMPLE_ERROR** | One row per table being migrated. Shows how many partitions are in each stage (extraction, loading, completed, or failed), along with related errors. Filterable by `WORKFLOW_ID`. |
| **DATA_MIGRATION_ERROR** | For each failed partition, contains the first known error. Filterable by `WORKFLOW_ID`. |

In the same schema, there is a [Streamlit](https://docs.snowflake.com/en/developer-guide/streamlit/about-streamlit) dashboard called **DATA_MIGRATION_DASHBOARD** that can be used to monitor the workflows. This dashboard presents data from those tables/views.

## Advanced Features

### Redshift UNLOAD

For **Redshift**, it is recommended to use the **UNLOAD** extraction strategy. The main idea behind this is:

- Large query results are written directly to an **S3 Bucket** instead of being downloaded to the machine in which the **Worker** is running.
- On **Snowflake** side, an [External Stage](https://docs.snowflake.com/en/user-guide/data-load-s3-create-stage) is set up to reference the corresponding **S3 Bucket**, so that **COPY INTO** statements can be done directly from that stage.

See the [Extraction Strategy](#extractionstrategy) section for configuration details.

### Incremental Synchronization

It is possible to migrate some tables and then re-migrate them in the future, moving only the data that has changed. See the [Synchronization Strategies](#synchronizationstrategy) section for the available strategies and their configuration.

## Considerations and Recommendations

### Connecting to Snowflake with a PAT

It is recommended to use [Programmatic Access Tokens](https://docs.snowflake.com/en/user-guide/programmatic-access-tokens) for connections used by the Orchestrator and **Workers**. This ensures there won't be a need to constantly authenticate through the browser or with an Authenticator app. You will need to establish a [Network Policy](https://docs.snowflake.com/en/user-guide/network-policies) or temporarily bypass the requirement for a Network Policy (this can be done from **Snowsight**).

### Running Orchestrator and/or Workers on SPCS

If you want to leverage **Snowflake** compute for these tasks, you can:

1. Prepare **Docker** images that use the Python modules and have the appropriate configuration.
2. Push those Docker images to an [Image Repository](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/working-with-registry-repository) in Snowflake.
3. Execute the **Orchestrator** and/or **Worker(s)** images using [Snowpark Container Services](https://docs.snowflake.com/en/developer-guide/snowpark-container-services/overview).

Some considerations:

- It is recommended to execute them as **Services**, not **Jobs**.
- It is possible to run only one component (Orchestrator or Workers) in SPCS and the other on another platform.
- It is a good practice to monitor the SPCS service and suspend it when it is not being used.
- Depending on the network configuration of the **Source System**, you might need to configure an [External Access Integration](https://docs.snowflake.com/en/sql-reference/sql/create-external-access-integration) so that these services can connect to your Source System.

### Initial Testing

It is recommended to deploy the DDL for the tables you want to migrate before starting data migration. This ensures the target types match the behavior you want in those tables and their related views/procedures. Converting the DDL from your source dialect into **Snowflake SQL** can be done through the Code Conversion capabilities of [SnowConvert AI](https://docs.snowflake.com/en/migrations/snowconvert-docs/overview) and/or [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code). If you don't deploy the DDL before starting data migration, the types will be inferred and might not be as accurate as desired.

Additionally, it is a good practice to move a few rows from each table as a test before starting the full migration. This helps detect configuration or connectivity issues early.

### Managing Workers

The time it takes to complete a workflow depends heavily on many variables. One of the variables that affects the most is the number of workers (and the number of threads per worker), since that determines how many extraction tasks can be executed in parallel. Consider:

1. It is not necessary to run two workers on the same machine. If you want more parallelism on one machine, increase the thread count instead.
2. Network bandwidth greatly affects the speed of workers and is effectively shared between threads of a worker.
3. Even with many workers/threads processing tasks in parallel, your source system might not have enough resources to handle the load.
4. You might want to keep a low worker count to avoid overloading your source system.
5. You might want to stop some (or all) of your workers at times when your source system is already overloaded by unrelated operations, to avoid disrupting those operations.

## Workflow Configuration Reference

The workflow configuration file is a JSON object. Its structure is described below using named models -- each model's properties reference other models by name.

### WorkflowConfiguration (Top Level)

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `schemaVersion` | String | No | Version of the configuration schema (e.g. "1.0.0"). Accepts formats "major", "major.minor", or "major.minor.patch". Defaults to "1.0.0" if omitted. |
| `tables` | [TableConfiguration](#tableconfiguration)[] | Yes | An array of table-specific configurations defining which tables to migrate and how. |
| `defaultTableConfiguration` | [TableConfiguration](#tableconfiguration) | No | Shared settings inherited by all tables. Table-specific values override these defaults (see [merging rules](#default-table-configuration-merging-rules) below). |
| `affinity` | String | No | Affinity group string. Ensures that only orchestrator and worker instances with a matching affinity process this workflow. |

### TableConfiguration

Defines the settings for migrating a single table.

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `source` | [SourceTargetIdentifier](#sourcetargetidentifier) | Yes | Identifies the source table. |
| `target` | [SourceTargetIdentifier](#sourcetargetidentifier) | Yes | Identifies the target table in Snowflake. |
| `columnNamesToPartitionBy` | String[] | Yes | Columns used to partition data during extraction. |
| `extraction` | [ExtractionStrategy](#extractionstrategy) | No | Configures how data is extracted from the source database. |
| `synchronization` | [SynchronizationStrategy](#synchronizationstrategy) | No | Configures incremental synchronization behavior. |
| `columnTypeMappings` | [ColumnTypeMapping](#columntypemapping)[] | No | Type conversions applied during migration. |
| `columnNameMappings` | [ColumnNameMapping](#columnnamemapping)[] | No | Column renaming mappings. |
| `primaryKeyColumns` | String[] | No | Primary key columns for the source table. Required when using `trackModifications` in the watermark synchronization strategy. |
| `partitionSize` | [PartitionSize](#partitionsize) | No | Configures the target size of each partition. Defaults to `"auto"`. |
| `whereClauseCriteria` | String | No | SQL-like filter to select a subset of rows (e.g., `"is_deleted = 0"`). |

#### Default Table Configuration Merging Rules

When `defaultTableConfiguration` is provided, its values are merged into each table entry using these rules:

- **Nested objects** (`source`, `target`, `synchronization`, `extraction`): Deep merge -- fields within are merged individually.
- **Collections** (`columnTypeMappings`, `columnNameMappings`, etc.): Table value replaces default entirely.
- **Scalars** (`whereClauseCriteria`): Table value overrides default.

### SourceTargetIdentifier

Used by `source` and `target` in [TableConfiguration](#tableconfiguration) to identify a database object.

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `databaseName` | String | Yes | Name of the database. |
| `schemaName` | String | Yes | Name of the schema. |
| `tableName` | String | Yes | Name of the table. |

The `target` object accepts two additional optional fields: `tableType` (`"native"` or `"iceberg"`) and `icebergConfig` (required when using Iceberg). If `tableType` is omitted or null, it defaults to **`"native"`** (standard Snowflake table). See the Redshift UNLOAD example for Iceberg settings.

### ColumnTypeMapping

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `sourceType` | String | Yes | Type name in the source system. |
| `targetType` | String | Yes | Target type in Snowflake. |

### ColumnNameMapping

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `sourceName` | String | Yes | Column name in the source system. |
| `targetName` | String | Yes | Target column name in Snowflake. |

### ExtractionStrategy

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `strategy` | `"regular"` or `"unload"` | Yes | Extraction method. `"regular"` is the default; `"unload"` is available for Redshift sources. |
| `externalStage` | String | UNLOAD only | Fully-qualified Snowflake external stage name (e.g., `"MY_DB.MY_SCHEMA.S3_STAGE"`). |

**regular** (default) -- Data is queried and downloaded through the Worker:

```json
"extraction": { "strategy": "regular" }
```

**unload** (Redshift only) -- Data is written to S3 via Redshift UNLOAD and loaded from an external stage:

```json
"extraction": { "strategy": "unload", "externalStage": "MY_DB.MY_SCHEMA.S3_EXTERNAL_STAGE" }
```

### PartitionSize

Controls how large each partition should be during extraction. Accepts a string or an object.

| Form | Description |
|------|-------------|
| `"auto"` (default) | The system picks optimal partition sizes based on the source platform, extraction strategy, and table size. |
| `{ "targetSizeMb": N }` | Each partition targets approximately `N` megabytes of data. |
| `{ "maxRowsPerPartition": N }` | Each partition contains at most `N` rows, regardless of data size. |

Only one of `targetSizeMb` or `maxRowsPerPartition` may be specified when using the object form.

**auto** (default) -- The system selects partition sizes tuned for the platform and extraction strategy:

```json
"partitionSize": "auto"
```

Auto mode uses larger partitions for Redshift UNLOAD (where S3 handles large files well) and smaller partitions for ODBC-based extraction (SQL Server, PostgreSQL, Redshift REGULAR) where data flows through the Worker's memory. For very large tables (100+ GB), the maximum number of partitions is increased to allow more parallelism.

**Fixed size in MB** -- Specify a target size per partition:

```json
"partitionSize": { "targetSizeMb": 2048 }
```

**Fixed row count** -- Specify a maximum number of rows per partition:

```json
"partitionSize": { "maxRowsPerPartition": 500000 }
```

### SynchronizationStrategy

Controls whether subsequent workflow runs perform a full re-extraction or only sync changed data.

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `strategy` | `"none"`, `"checksum"`, or `"watermark"` | Yes | The synchronization method. |
| `watermarkColumn` | String | Watermark only | Column name to track (must be monotonically increasing). |
| `trackModifications` | Boolean | No | If `true`, uses the primary key to deduplicate modified rows. Requires `primaryKeyColumns` in [TableConfiguration](#tableconfiguration). |

**none** (default) -- Full extraction on every run. No synchronization metadata is stored.

```json
"synchronization": { "strategy": "none" }
```

- **Use when:** Data is small, changes are unpredictable, or guaranteed consistency is needed.

**checksum** -- Computes a hash of all column values per partition. Only changed partitions are cleared and re-extracted.

```json
"synchronization": { "strategy": "checksum" }
```

- **Use when:** You need to detect any change but lack a reliable monotonic column (e.g., dimension tables).
- **Trade-offs:** Requires a checksum computation on the source for every partition on every run.

**watermark** -- Tracks a monotonic column (timestamp, ID, version) to sync only rows newer than the last observed maximum.

```json
"synchronization": { "strategy": "watermark", "watermarkColumn": "UPDATED_AT" }
```

- **Use when:** Your table has a reliable monotonic column that increases on insert/update (e.g., fact tables, event logs).
- **Limitation:** Watermark alone cannot currently track deletions. Support for this will be added in the future.

### Quoting Identifiers

Names that need quoting (or brackets) must be manually quoted as they would normally be in JSON. For example: `"tableName": "\"MyCaseSensitiveTable\""`.

## Workflow Configuration Examples

### Basic Migration (SQL Server)

Migrates two tables with shared source/target schemas, type mappings, column renaming, watermark sync, and row filtering:

```json
{
  "defaultTableConfiguration": {
    "source": {
      "schemaName": "data_migration_cloud_test",
      "databaseName": "SampleStoreDB"
    },
    "target": {
      "schemaName": "data_migration_cloud_test",
      "databaseName": "samplestoredb"
    }
  },
  "tables": [
    {
      "source": { "tableName": "store_employee" },
      "target": { "tableName": "target_employee" },
      "columnNamesToPartitionBy": ["ID"]
    },
    {
      "source": { "tableName": "Sales_Simple" },
      "target": { "tableName": "Sales_Simple" },
      "columnNamesToPartitionBy": ["ID"],
      "columnTypeMappings": [
        { "sourceType": "MONEY", "targetType": "DECIMAL(19,4)" }
      ],
      "columnNameMappings": [
        { "sourceName": "id", "targetName": "old_id" },
        { "sourceName": "name", "targetName": "full_name" }
      ],
      "synchronization": {
        "strategy": "watermark",
        "watermarkColumn": "UPDATED_AT"
      },
      "partitionSize": { "targetSizeMb": 2048 },
      "whereClauseCriteria": "is_deleted = 0"
    }
  ]
}
```

### Redshift UNLOAD

Uses the UNLOAD extraction strategy with an external stage for S3-based data transfer:

```json
{
  "defaultTableConfiguration": {
    "source": {
      "schemaName": "ecommerce_raw",
      "databaseName": "snowconvert_demo"
    },
    "target": {
      "schemaName": "ecommerce_raw",
      "databaseName": "TARGET_DB"
    },
    "extraction": {
      "strategy": "unload",
      "externalStage": "MY_DB.MY_SCHEMA.S3_EXTERNAL_STAGE"
    },
    "partitionSize": "auto"
  },
  "tables": [
    {
      "source": { "tableName": "customers" },
      "target": { "tableName": "customers" },
      "columnNamesToPartitionBy": ["customer_id"]
    },
    {
      "source": { "tableName": "orders" },
      "target": { "tableName": "orders" },
      "columnNamesToPartitionBy": ["order_id"],
      "columnTypeMappings": [
        { "sourceType": "NUMERIC(10,2)", "targetType": "DECIMAL(10,2)" }
      ]
    }
  ]
}
```

### Redshift UNLOAD with Iceberg Tables

Combines Redshift UNLOAD with Iceberg table targets, including Snowflake-managed and Glue catalog configurations:

```json
{
  "defaultTableConfiguration": {
    "source": {
      "schemaName": "public",
      "databaseName": "analytics_db"
    },
    "target": {
      "schemaName": "public",
      "databaseName": "TARGET_DB",
      "tableType": "iceberg",
      "icebergConfig": {
        "catalog": "SNOWFLAKE",
        "externalVolume": "my_iceberg_ext_vol",
        "baseLocationPrefix": "migrations/redshift",
        "sourceDataStage": "@TARGET_DB.PUBLIC.ICEBERG_SOURCE_STAGE"
      }
    },
    "extraction": {
      "strategy": "unload",
      "externalStage": "TARGET_DB.PUBLIC.S3_EXTERNAL_STAGE"
    },
    "partitionSize": "auto"
  },
  "tables": [
    {
      "source": { "tableName": "customers" },
      "target": { "tableName": "customers" },
      "columnNamesToPartitionBy": ["customer_id"]
    },
    {
      "source": { "tableName": "events" },
      "target": {
        "tableName": "events",
        "tableType": "iceberg",
        "icebergConfig": {
          "catalog": "my_glue_catalog_integration",
          "externalVolume": "my_iceberg_ext_vol",
          "catalogTableName": "glue_db.events"
        }
      },
      "columnNamesToPartitionBy": ["event_id"]
    },
    {
      "source": { "tableName": "orders" },
      "target": {
        "tableName": "orders",
        "tableType": "iceberg",
        "icebergConfig": {
          "catalog": "my_glue_catalog_integration",
          "externalVolume": "my_iceberg_ext_vol",
          "catalogTableName": "glue_db.orders",
          "migrationStrategy": "convert_to_managed"
        }
      },
      "columnNamesToPartitionBy": ["order_id"]
    }
  ]
}
```

### Incremental Sync with Watermark

Uses watermark-based synchronization with modification tracking for incremental data migration:

```json
{
  "defaultTableConfiguration": {
    "source": { "databaseName": "SRC", "schemaName": "dbo" },
    "target": { "databaseName": "TGT", "schemaName": "public" },
    "synchronization": {
      "strategy": "watermark",
      "watermarkColumn": "updated_at"
    }
  },
  "tables": [
    {
      "source": { "tableName": "orders" },
      "target": { "tableName": "orders" },
      "columnNamesToPartitionBy": ["order_id"],
      "primaryKeyColumns": ["order_id"],
      "synchronization": {
        "trackModifications": true
      }
    }
  ]
}
```

## Worker Configuration

This section documents the configuration for the Worker (`snowflake-data-exchange-agent` package). The Worker configuration file uses [TOML](https://toml.io/) format.

| Section | Property | Type | Description |
|---------|----------|------|-------------|
| **Top Level** | `selected_task_source` | String | Currently should always be set to `"snowflake_stored_procedure"`. |
| `[application]` | `max_parallel_tasks` | Integer | Maximum number of tasks the worker will process in parallel (using threads). |
| `[application]` | `task_fetch_interval` | Integer | Interval (in seconds) between attempts to fetch new tasks from the Orchestrator. |
| `[connections.source.*]` | | Object | Configuration for source system connections. The Worker typically requires an ODBC driver. See examples below. |
| `[connections.target.snowflake_connection_name]` | `connection_name` | String | The name of the connection entry in the `~/.snowflake/config.toml` file to use. |

**Example: SQL Server (Standard Authentication)**

```toml
[connections.source.sqlserver]
username = "username"
password = "password"
database = "database_name"
host = "127.0.0.1"
port = 1433
```

**Example: Amazon Redshift (IAM Authentication)**

```toml
[connections.source.redshift]
username = "demo-user"
database = "demo_db"
auth_method = "iam-provisioned-cluster"
cluster_id = "my-aws-cluster"
region = "us-west-2"
access_key_id = "your-access-key-id"
secret_access_key = "your-secret-access-key"
```

**Example: Amazon Redshift (Standard Authentication)**

```toml
[connections.source.redshift]
username = "myuser"
password = "mypassword"
database = "mydatabase"
host = "my-cluster.abcdef123456.us-west-2.redshift.amazonaws.com"
port = 5439
auth_method = "standard"
```

**Example: Teradata (ODBC)**

```toml
[connections.source.teradata]
driver_name = "Teradata Database ODBC Driver 17.20"
host = "your-teradata-host.example.com"
port = 1025
database = "tpcds"
username = "your_username"
password = "your_password"
```

> **Note:** Only one source connection is needed. The Snowflake target connection should point to a valid entry in your `~/.snowflake/config.toml`.
