Metadata-Version: 2.4
Name: mkpipe-loader-sqlite
Version: 0.5.0
Summary: SQLite loader for mkpipe.
Author: Metin Karakus
Author-email: metin_karakus@yahoo.com
License: Apache License 2.0
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mkpipe
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# mkpipe-loader-sqlite

SQLite loader plugin for [MkPipe](https://github.com/mkpipe-etl/mkpipe). Writes Spark DataFrames into SQLite database files via JDBC.

## Documentation

For more detailed documentation, please visit the [GitHub repository](https://github.com/mkpipe-etl/mkpipe).

## License

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.

---

## Connection Configuration

```yaml
connections:
  sqlite_target:
    variant: sqlite
    database: /full/path/to/mydb.db
```

---

## Table Configuration

```yaml
pipelines:
  - name: pg_to_sqlite
    source: pg_source
    destination: sqlite_target
    tables:
      - name: public.users
        target_name: stg_users
        replication_method: full
        batchsize: 5000
```

---

## Write Strategy

Control how data is written to SQLite:

```yaml
      - name: public.users
        target_name: stg_users
        write_strategy: upsert       # append | replace | upsert | merge
        write_key: [id]              # required for upsert/merge
```

| Strategy | SQLite Behavior |
|---|---|
| `append` | Plain `INSERT` via JDBC (default for incremental) |
| `replace` | Drop and recreate table, then insert (default for full) |
| `upsert` | `INSERT ... ON CONFLICT (write_key) DO UPDATE` via temp table |
| `merge` | Same as upsert for SQLite |

---

## Write Throughput

```yaml
      - name: public.users
        target_name: stg_users
        replication_method: full
        batchsize: 5000
```

### Performance Notes

- SQLite uses file-level locking — **`write_partitions` has no benefit** here as concurrent writes will serialize or fail. Set `write_partitions: 1` to avoid contention.
- Keep `batchsize` moderate (1,000–10,000) — large transactions in SQLite can cause memory pressure.
- SQLite is best suited for small-to-medium outputs (local development, testing, lightweight pipelines).

---

## All Table Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `name` | string | required | Source table name |
| `target_name` | string | required | SQLite destination table name |
| `replication_method` | `full` / `incremental` | `full` | Replication strategy |
| `batchsize` | int | `10000` | Rows per JDBC batch insert |
| `write_partitions` | int | — | Set to `1` to avoid SQLite lock contention |
| `write_strategy` | string | — | `append`, `replace`, `upsert`, `merge` |
| `write_key` | list | — | Key columns for upsert/merge (required) |
| `dedup_columns` | list | — | Columns used for `mkpipe_id` hash deduplication |
| `tags` | list | `[]` | Tags for selective pipeline execution |
| `pass_on_error` | bool | `false` | Skip table on error instead of failing |
