Metadata-Version: 2.4
Name: h2ogpte-migration
Version: 1.2.0
Summary: Automated tool for migrating h2oGPTe collections
Author-email: "H2O.ai" <support@h2o.ai>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/h2oai/h2ogpte-embedding-migration
Project-URL: Repository, https://github.com/h2oai/h2ogpte-embedding-migration
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: h2ogpte==1.6.57
Requires-Dist: python-dotenv
Dynamic: license-file

# h2oGPTe Migration Tool

Automated tool for migrating h2oGPTe collections with tracking, verification, and resume capabilities.

## Overview

This tool helps administrators and collection owners migrate collections to new embedding models while:
- **Preserving collection settings** - permissions, lifecycle settings, scheduled connectors, document metadata
- **Tracking migration state** in a SQLite database with step-level granularity
- **Supporting resume** - interrupted migrations can be resumed without duplicating work
- **Verifying job completion** with document counts, embedding model, lifecycle settings, and document statuses
- **RAG verification** - optionally test each migrated collection with a chat query to confirm it works
- **Optionally migrating chat sessions** from old to new collections
- **Manual move operations** - move connectors and chats between any collections
- **Supporting both admin bulk and self-service migrations**

---

## Installation

```bash
pip install h2ogpte-migration
```

This installs the `h2ogpte-migrate` command.

---

## How It Works

### Per-Collection Migration Flow

For each collection, the tool performs these steps:

```
1. Create new collection (target embedding model)        --> DB: collection_created=1
2. Copy permissions (public, user, group)                --> DB: permissions_copied=1
3. Import documents + settings (single server job)       --> DB: import_submitted=1
   - Documents (with preserved ingest modes)
   - Lifecycle settings (expiry, inactivity, size limit)
   - Document metadata
4. Migrate scheduled connectors (after import succeeds)  --> DB: connectors_migrated=1
5. [Optional] Migrate chat sessions (after connectors)   --> DB: chats_migrated=1
```

The import job (step 3) handles documents, lifecycle settings, and metadata in a single server-side operation. Connectors and chats are migrated separately after the import succeeds to ensure they are only moved to a working collection.

### Execution Modes

**Parallel (default):** Submit all import jobs without waiting. Jobs run in the background on the server. Use `--verify` later to check completion — verify also triggers connector migration for completed jobs, and optionally chat migration with `--verify --migrate-chats`. Note: passing `--migrate-chats` without `--wait-for-completion` will NOT migrate chats during the run — it is deferred to the next `--verify --migrate-chats` invocation.

**Sequential (`--wait-for-completion`):** Wait for each import job to complete before moving to the next collection. After each successful import, connectors are migrated immediately. If `--migrate-chats` is specified, chat sessions are migrated after connectors. Optionally use `--max-concurrent-jobs N` to process multiple collections concurrently while still waiting for each to complete.

Both execution modes produce identical end results — they differ only in when connectors/chats are migrated (inline for sequential, on next `--verify` for parallel).

### Behavior Summary

| Action | `--wait-for-completion` | Parallel (default) |
|--------|------------------------|-------------------|
| Create new collection | Automatic | Automatic |
| Copy permissions | Automatic | Automatic |
| Import documents + settings | Waits for completion | Submits and exits |
| Validate migration (doc counts, model, statuses) | Automatic after import | On `--verify` |
| RAG verification | Needs `--verify-query` in the migration command | On `--verify --verify-query` |
| Migrate connectors | Automatic after import | On `--verify` |
| Migrate chats | Needs `--migrate-chats` in the migration command | On `--verify --migrate-chats` |

**Why connectors are automatic but chats are opt-in:**
- Connectors handle scheduled document ingestion — if they aren't moved, the new collection won't receive future data updates
- Chats are optional — admins may want to verify the migration before moving users' chat history, giving flexibility on timing

### What happens to the old collection

The old collection is **not deleted** by this tool. After migration:
- Documents are shared (referenced by both old and new collections via `copy_document=False`)
- Scheduled connectors have been **moved** to the new collection (old collection has none)
- Chat sessions have been **moved** to the new collection (if `--migrate-chats` was used)
- The old collection can be manually deleted once you've confirmed the migration is complete
- With `copy_document=False`, deleting the old collection is safe — documents survive because the new collection still references them

### Why connectors and chats are separate from the import job

Scheduled connectors are **moved** (not copied) from the old collection to the new one. This is a destructive operation — the old collection loses its connectors. If the import job failed (e.g., embedding model error), moving connectors to a broken collection would leave the old collection without connectors and the new collection unusable. By running connector migration only after a confirmed successful import, we ensure connectors are only moved to a working collection.

Chat sessions follow the same principle — they should only be moved after both the import and connector migration succeed.

### Resume Capability

If the tool is interrupted, re-running the same command will:
- **Skip** collections that are fully migrated (unless `--force-remigrate` is used). If connectors/chats were previously moved to another collection, the tool logs the exact `--move-connectors`/`--move-chats` commands needed to recover them
- **Reuse** collections that were created but not yet imported (avoids orphaned collections)
- **Direct to `--verify`** for collections with submitted but incomplete imports
- **Run pending post-import steps** for collections where import completed but connectors/chats haven't been migrated
- **Retry** failed collections when `--retry-failed` is specified (creates a new collection)

---

## What Gets Migrated

| Item | How (API calls / configs) | When |
|------|-----|------|
| Documents | Server-side import job (`import_collection_into_collection`) | During import |
| Document metadata | `preserve_metadata=True` | During import |
| Document ingest modes | `preserve_document_status=True` (agent_only stays agent_only) | During import |
| Lifecycle settings | `copy_lifecycle_settings=True` (expiry, inactivity, size limit) | During import |
| Public permissions | `list_collection_public_permissions` + `make_collection_public` | Before import |
| User/group permissions | `list_collection_permissions` + `share_collection` | Before import |
| Scheduled connectors | `migrate_scheduled_connectors_to_collection` (moved from old to new) | After import succeeds |
| Chat sessions | `migrate_chat_sessions_to_collection` (opt-in via `--migrate-chats`) | After connectors succeed |

---

## Authentication Modes

### Admin Mode (`--admin-key`)
- Migrate collections for any user
- Use `--users`, `--all-users`, or `--collections` to set scope
- Automatically creates temporary API keys for collection owners
- Best for bulk migrations across the organization

### Self-Service Mode (`--user-key`)
- Migrate only collections you own
- Optionally use `--collections` to specify which collections
- No additional temporary API keys needed (your user key is used directly)
- For collection owners managing their own migrations

---

## Model Mappings

When using the `--use-model-mappings` flag, the tool uses the following predefined source→target model mappings. Collections whose current embedding model matches a source model below will be migrated to the corresponding target model. Collections using any other model are skipped.

| Source Model (deprecated) | Target Model (compliant) |
|--------------------------|-------------------------|
| BAAI/bge-m3 | h2oai/embeddinggemma-300m-qat-q8_0-unquantized |
| BAAI/bge-large-en-v1.5 | mixedbread-ai/mxbai-embed-large-v1 |

To migrate to a model not listed here, use `--target-model` instead.

---

## Flag Reference

| Flag | Description |
|------|-------------|
| **Required** | |
| `--url <url>` | h2oGPTe instance URL |
| **Authentication** (choose one) | |
| `--admin-key [key]` | Admin API key. Enables migration for any user via `--users`, `--all-users`, or `--collections`/`--collections-file`. Automatically creates and cleans up temporary API keys for collection owners. Pass without a value to read from `H2OGPTE_ADMIN_KEY` env var |
| `--user-key [key]` | User API key. Migrates only collections you own. No additional temporary API keys created. Cannot use `--users` or `--all-users`. Pass without a value to read from `H2OGPTE_USER_KEY` env var |
| **Migration Scope** (choose one) | |
| `--users <names>` | Comma-separated usernames to migrate (admin only). Example: `"john.doe, jane.smith"` |
| `--all-users` | Migrate all users in the organization (admin only). Use with caution on large organizations — this will consider all collections in the system |
| `--collections <ids>` | Comma-separated collection IDs. Works with both admin and user keys. Auto-detects owners in admin mode. Also used to filter `--verify` scope |
| `--collections-file <path>` | Path to a file containing collection IDs (one per line, `#` comments ignored). Works like `--collections` but reads from a file |
| **Migration Mode** (choose one) | |
| `--use-model-mappings` | Use predefined source→target model mappings (see Model Mappings section). Collections whose model isn't in the mapping are skipped. Cannot be combined with `--source-model` or `--target-model` |
| `--target-model <model>` | Target embedding model. Required when not using `--use-model-mappings` |
| `--source-model <model>` | Only migrate collections using this specific embedding model (optional). Without this, all collections are migrated to `--target-model` regardless of what embedding model they are currently using |
| **Execution** | |
| `--wait-for-completion` | Wait for each collection to fully complete migration before moving to the next (import, validation, connectors, and optionally chats). Without this flag, jobs are submitted in parallel and `--verify` must be run separately. Use `--max-concurrent-jobs N` to process multiple collections concurrently |
| `--max-concurrent-jobs <N>` | Number of collections to process concurrently with `--wait-for-completion` (default: 1). Each worker runs the full migration cycle (create, import, validate, connectors, chats) independently. Has no effect without `--wait-for-completion` |
| `--verify` | Check status of previously submitted import jobs. Validates completed imports (document counts, embedding model, lifecycle settings, document statuses). Migrates connectors for successfully completed imports. Often combined with `--migrate-chats` and `--verify-query`. Cannot be combined with migration flags (`--use-model-mappings`, `--target-model`, `--source-model`) |
| `--migrate-chats` | Migrate chat sessions from old to new collection after successful import and connector migration. With `--wait-for-completion`: migrates chats inline. Without it: deferred to `--verify --migrate-chats`. Chats are only moved after connectors succeed |
| `--dry-run` | Preview what would be migrated without making any changes. Shows target models, permissions, lifecycle settings, document counts, and import settings. No collections created, no database writes |
| **Retry/Resume** | |
| `--retry-failed` | Retries collections whose import jobs failed. Creates a new collection and re-submits the import. The previous failed collection remains and needs manual cleanup. Note: This flag is not needed for the case where a collection's import job succeeded but connector/chat migration failed. Those are automatically retried on the next `--verify --migrate-chats` run. Cannot be combined with `--force-remigrate` |
| `--force-remigrate` | Re-migrate collections regardless of their migration status. Creates new collections even if previously migrated successfully. Overwrites database records. Use with caution — verify the state of the old collection before re-migrating. If a previous successful migration already moved connectors/chats, use `--move-connectors`/`--move-chats` to restore the original collection's state beforehand, or use the recovery commands logged during re-migration. Cannot be combined with `--retry-failed` |
| **Manual Move** (recovery actions, not part of regular migration workflows) | |
| `--move-connectors` | Move scheduled connectors from `--from` collection to `--to` collection. The h2oGPTe API enforces ownership — the user must own both collections. With `--admin-key`, the tool looks up the source collection owner and impersonates them. Cannot be combined with migration or verify flags |
| `--move-chats` | Move chat sessions from `--from` collection to `--to` collection. Can be combined with `--move-connectors` to move both in a single command. Same ownership rules apply |
| `--from <id>` | Source collection ID for `--move-connectors`/`--move-chats` |
| `--to <id>` | Target collection ID for `--move-connectors`/`--move-chats`. Must be different from `--from` |
| **Verification** | |
| `--verify-query <query>` | RAG verification query. For each completed migration, creates a temporary chat session on the new collection, sends the query, checks that the response includes document references, logs a response preview, and deletes the test chat session. Informational only — does not block connector/chat migration. Requires `--verify` or `--wait-for-completion`. The same query is sent to every collection being verified — often combined with `--collections` to target specific collections |
| **Options** | |
| `--copy-document` | Copy documents instead of referencing. Default (False) references documents — both old and new collections point to the same document record (with different embeddings). Faster (skips creating new document records, storage uploads, and cataloging) and saves storage. Use `--copy-document` for full storage isolation between collections |
| `--skip-reparse` | Re-embed existing chunks without re-parsing documents. Reads text chunks from the source collection, re-embeds them with the target embedding model, and stores them in the new collection — skipping file fetch, PDF conversion, OCR, and chunking. Significantly faster for embedding model migrations where only the embedding model changes. Requires `copy_document=False` (default). Cannot be used with `--copy-document` |
| `--ocr-model <model>` | OCR model to use during document re-parsing (default: auto). Use this to override the source collection's OCR model, e.g., when migrating away from a CN model. Examples: `auto`, `off`, `tesseract`. Not applicable when `--skip-reparse` is used |
| `--db-path <path>` | Path to SQLite database for tracking migration state (default: migration_tracking.db). A database file is created automatically in the directory where the tool is run. This flag is optional — only needed if you want to store the database in a custom location. Must use the same path for `--verify` as the original migration, otherwise it creates a new empty database and finds no pending jobs |
| `--cert <path>` | Path to CA certificate file for SSL verification. Omit this flag if no certificate is required |
| `--api-key-expiry <duration>` | Expiry duration for temporary API keys created in admin mode (default: 30 days). Example: `"7 days"`, `"30 days"` |

---

## Usage Examples

### Quick Start

### 1. Sequentially migrate specific collections (one at a time)

Migrate, validate, move chats, and verify RAG in a single command:

```bash
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --collections "col-123, col-456" --use-model-mappings --wait-for-completion --migrate-chats --verify-query "What is our refund policy?"
```

**What happens (for each collection, one at a time — fully completes before moving to the next):**
- Creates a new collection with the target model from the predefined model mapping
- Copies permissions (public, user, group) and lifecycle settings (expiry, inactivity, size limit) to the new collection
- Imports documents and waits for the import job to complete
- Validates the import (document counts, embedding model, lifecycle settings, document statuses)
- Creates a temporary test chat session, sends the RAG verification query, checks for document references, logs the response preview, and deletes the test chat session
- Migrates scheduled connectors automatically
- Migrates chat sessions (because `--migrate-chats` is passed)

**Note:** The same `--verify-query` is sent to every collection. For collection-specific queries, run separate commands per collection (e.g., in multiple terminals).

**Tip:** Add `--max-concurrent-jobs N` to process multiple collections concurrently instead of one at a time. See Example 2 below.

---

### 2. Concurrent migration with controlled parallelism

Migrate many collections concurrently with `--max-concurrent-jobs`:

```bash
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-1, col-2, ..., col-100" --use-model-mappings --wait-for-completion --max-concurrent-jobs 10 --migrate-chats
```

**What happens:**
- Up to 10 collections are processed concurrently
- Each worker independently: creates a new collection, copies permissions, imports documents, waits for completion, validates, migrates connectors and chats
- When a worker finishes one collection, it picks up the next from the queue
- At most 10 import jobs are active on the server at any time
- If any collection fails, the rest continue unaffected — failed collections can be retried later with `--retry-failed`

**Note:** Without `--max-concurrent-jobs` (or with `--max-concurrent-jobs 1`), `--wait-for-completion` processes one collection at a time. Use higher values to speed up large-scale migrations while controlling server load.

**Tip:** With concurrent workers, log lines from different collections are interleaved. Each line is prefixed with `[Collection Name]`, so you can filter the log file for a specific collection:
```bash
grep "\[Collection Alpha\]" migration_20260316_225645.log
```

---

### 3. Migrate specific collections in parallel (multiple at the same time)

Submit all jobs at once, then verify separately:

```bash
# Step 1: Submit migration jobs (runs in background)
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456, col-789" --use-model-mappings

# Step 2: Verify completion, migrate connectors + chats, and run RAG check
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456, col-789" --verify --migrate-chats --verify-query "What is our refund policy?"
```

**What happens in Step 1:**
- For each collection: looks up the owner, creates a temporary API key for them
- Creates new collections with the target model from the predefined model mapping
- Copies permissions, lifecycle settings, and submits import jobs for each collection
- Exits immediately once all collections have had jobs created — jobs continue running in the background on the server

**What happens in Step 2:**
- Checks the status of each import job
- For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
- Runs the RAG verification query on each completed collection (because `--verify-query` was included)
- Migrates scheduled connectors automatically for successfully completed imports
- Migrates chat sessions (because `--migrate-chats` is passed) — chats are only moved after the import succeeds, so it's safe to include in the verify step

**Note:** The `--collections` flag in Step 2 applies verification to those specific collections. The same `--verify-query` is sent to every collection specified. Without `--collections`, `--verify` checks all pending jobs in the database (admin mode). If you were to use `--user-key` instead and omitted the `--collections` with `--verify`, only jobs belonging to your account are checked.

**Tip for admins:** `--users "john.doe, jane.smith"` can be used to scope to specific users instead of collection IDs. `--all-users` is also available to migrate every collection across all users, but use with caution on large organizations as it submits import jobs for all collections at once.

---

## More Examples

### 4. Dry run (preview changes)
```bash
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --users john.doe --use-model-mappings --dry-run
```

**What happens:**
- Shows which collections would be migrated and the target embedding models
- Displays permissions that would be copied (public, user, group)
- Shows lifecycle settings that would be copied (expiry, inactivity interval, size limit)
- Shows document counts and import settings
- **No actual changes** — no collections created, no database writes

---

### 5. Specific model migration with OCR model override
```bash
# Step 1: Submit migration jobs for a specific source model, using tesseract for OCR
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --users john.doe --source-model "BAAI/bge-large-en-v1.5" --target-model "mixedbread-ai/mxbai-embed-large-v1" --ocr-model "tesseract"

# Step 2: Verify completion, migrate connectors + chats
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --users john.doe --verify --migrate-chats
```

**What happens in Step 1:**
- Creates a temporary API key for the user
- Scans all collections owned by the user
- Only processes collections using `BAAI/bge-large-en-v1.5` (ignores all others)
- For each matching collection: creates a new collection with `mixedbread-ai/mxbai-embed-large-v1`, copies permissions and lifecycle settings, and submits an import job
- The `--ocr-model "tesseract"` flag overrides the OCR model used during document re-parsing (default: `auto`). Use this when migrating away from a CN OCR model or to preserve a specific OCR model like Tesseract
- Useful for phased migrations — migrate one model at a time instead of using predefined mappings

**What happens in Step 2:**
- Checks the status of each import job for the user
- For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
- Migrates scheduled connectors automatically for successfully completed imports
- Migrates chat sessions for successfully completed imports (because `--migrate-chats` is passed)
- No RAG verification is done (`--verify-query` was not included — add it to Step 2 if needed, but keep in mind the same query would apply to all collections verified)

---

### 6. Verify and migrate chats (check job status, migrate connectors + chats)
```bash
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --verify --migrate-chats
```

**What happens:**
- Does NOT run any new migrations
- Queries the database for all pending/submitted/running jobs, plus completed jobs with pending post-import steps (i.e., connectors and chats that weren't migrated inline because `--wait-for-completion` was omitted)
- Checks the status of each import job
- For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
- Migrates scheduled connectors automatically for successfully completed imports
- Migrates chat sessions for successfully completed imports (because `--migrate-chats` is passed)
- Reports summary (completed/failed/running/canceled counts)

Without `--users` or `--collections`, checks all pending jobs in the database with `--admin-key`. With `--user-key`, only jobs belonging to your account are checked.

**Filter by user or collection:**
```bash
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --verify --users john.doe
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --verify --collections "col-123, col-456"
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --verify --collections "col-123, col-456"
```

---

### 7. Retry failed migrations
```bash
# Admin: retry specific collections
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456" --use-model-mappings --retry-failed --wait-for-completion --migrate-chats

# Self-service: retry all your failed collections (no --collections scope needed)
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --use-model-mappings --retry-failed --wait-for-completion --migrate-chats
```

**What happens:**
- Skips collections that are completed, submitted, or running
- For collections with a failed import job status: creates a **new** collection and re-submits the import
  - Logs a warning with the previous failed collection ID for reference (needs manual cleanup)
- With `--wait-for-completion`: waits for each retried import to complete, validates, migrates connectors and chats inline
- Without `--wait-for-completion`: submits jobs in the background — run `--verify --migrate-chats` later to check completion and migrate connectors + chats

---

### 8. Force re-migration
```bash
# Step 1: Force re-migrate specific collections
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456" --use-model-mappings --force-remigrate

# Step 2: Verify completion, migrate connectors + chats
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456" --verify --migrate-chats
```

**What happens in Step 1:**
- Collections are picked up regardless of their migration status
- Creates new collections for ALL specified collections, even if they were previously migrated successfully
- Overwrites previous local migration database records for those collections
- Previously migrated collections remain in the user's account (needs manual cleanup)
- **Caution:** If a previous successful migration already moved connectors/chats to another collection, they should either be manually moved back to the original collection before running this (via `--move-connectors`/`--move-chats`), or the tool will log the exact move commands to recover them after the re-migration completes

**What happens in Step 2:**
- Checks the status of each import job
- For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
- Migrates scheduled connectors from the original collection, if they still exist there
- Migrates chat sessions from the original collection, if they still exist there (because `--migrate-chats` is passed)
- **Important:** If a previous migration already moved connectors/chats out of the original collection, they won't be found here — use `--move-connectors`/`--move-chats` to recover them (see commands logged in Step 1)

---

### 9. Manually move connectors and/or chats between collections (for recovery purposes)
```bash
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --move-connectors --from "col-abc" --to "col-def"
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --move-chats --from "col-abc" --to "col-def"
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --move-connectors --move-chats --from "col-abc" --to "col-def"
```

**What happens:**
- Moves scheduled connectors and/or chat sessions from the source collection to the target collection
- The source collection will no longer have the moved items after this operation
- Useful for recovering connectors/chats after `--force-remigrate`, or reorganizing collections
- Works with both `--admin-key` and `--user-key` (server enforces ownership)
- In admin mode, automatically creates a temporary API key for the source collection's owner

---

## Database Tracking

### Schema
```sql
CREATE TABLE collection_migrations (
    old_collection_id TEXT PRIMARY KEY,
    old_collection_name TEXT,
    new_collection_id TEXT,
    new_collection_name TEXT,
    old_model TEXT,
    new_model TEXT,
    job_id TEXT,
    job_status TEXT,
    user_id TEXT,
    username TEXT,
    created_at TIMESTAMP,
    completed_at TIMESTAMP,
    error TEXT,
    -- Step tracking
    collection_created BOOLEAN DEFAULT 0,
    permissions_copied BOOLEAN DEFAULT 0,
    import_submitted BOOLEAN DEFAULT 0,
    import_completed BOOLEAN DEFAULT 0,
    connectors_migrated BOOLEAN DEFAULT 0,
    chats_migrated BOOLEAN DEFAULT 0
);
```

### Job Statuses
- **`pending`** - Collection created, import not yet submitted
- **`submitted`** - Import job submitted, running in background
- **`running`** - Import job verified as in-progress
- **`completed`** - Import job completed successfully
- **`failed`** - Import job failed, canceled, or had errors

### Resume Behavior
| DB State | On Re-run |
|----------|-----------|
| `collection_created=1, import_submitted=0` | Reuses existing collection, re-copies permissions, submits import |
| `import_submitted=1, import_completed=0` | Skips, tells user to run `--verify` |
| `import_completed=1, connectors_migrated=0` | Migrates connectors |
| `import_completed=1, connectors_migrated=1, chats_migrated=0` | With `--migrate-chats`: migrates chats |
| `import_completed=1, connectors_migrated=1, chats_migrated=1` | Fully done, skips |
| `job_status='failed'` | With `--retry-failed`: creates new collection |
| Any state | With `--force-remigrate`: ignores DB, creates new collection. Logs `--move-connectors`/`--move-chats` commands if connectors/chats were previously moved |

---

## Output Files

- **`migration_YYYYMMDD_HHMMSS.log`** - Detailed log with timestamps, job IDs, errors. Created in the directory where the tool is run
- **`migration_tracking.db`** - SQLite database with migration state. Created automatically in the directory where the tool is run. Use `--db-path` for a custom location. If running from a different directory later (e.g., for `--verify`), pass `--db-path` pointing to the original database, otherwise a new empty database is created and no pending jobs are found

---

## Troubleshooting

### SSL Certificate Errors
```bash
--cert ~/path/to/ca-chain.crt    # Provide certificate
# Omit --cert if no certificate is required
```

### Check Migration Status
```bash
sqlite3 migration_tracking.db "SELECT old_collection_name, job_status, import_completed, connectors_migrated, chats_migrated, error FROM collection_migrations;"
```

### Collection Already Migrated
```bash
--force-remigrate    # Re-migrate (creates new collection, old one needs manual cleanup)
```
**Caution:** If a previous successful migration already moved connectors/chats, use `--move-connectors`/`--move-chats` to restore state before re-migrating, or use the recovery commands logged during re-migration.

### Using `--verify` with a custom database path
If your initial migration used `--db-path /custom/path/migration.db`, you must use the same `--db-path` for `--verify`, otherwise it creates a new empty database and finds no pending jobs.

### Failed Import - Retry
```bash
--retry-failed       # Creates new collection for failed imports
```

---

## Best Practices

1. **Always dry-run first** - Use `--dry-run` to preview changes
2. **Test on a single collection or user** - Understand and validate how the migration works before running on a larger scale
3. **Run during off-hours** - Minimize impact on users
4. **Use parallel mode for large batches** - Submit jobs without waiting, verify later
5. **Always run `--verify` after parallel migrations** - This checks completion, validates imports, and migrates connectors. Include `--migrate-chats` to also migrate chat sessions
6. **Use `--verify-query` for RAG validation** - Sends a test query to each migrated collection, checks for document references, and cleans up the test chat session. Informational only — does not block connector/chat migration. The same query applies to all collections being verified, so use `--collections` to target collections with similar content for accurate results
7. **Use `--move-connectors`/`--move-chats` with `--force-remigrate`** - If a collection needs to be re-migrated, a previous successful migration may have already moved connectors/chats to another collection. Use `--move-connectors`/`--move-chats` to recover them to the appropriate collection. The tool logs the exact commands needed during re-migration
8. **Keep database and logs** - Archive for audit trail
9. **Clean up failed collections manually** - After `--retry-failed`, old failed collections remain
